2023-12-21 10:53:09,261 INFO [train.py:953] (3/4) Training started 2023-12-21 10:53:09,261 INFO [train.py:963] (3/4) Device: cuda:3 2023-12-21 10:53:09,262 INFO [train.py:965] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-dirty', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2023-12-21 10:53:09,262 INFO [train.py:967] (3/4) About to create model 2023-12-21 10:53:14,350 INFO [train.py:971] (3/4) Number of model parameters: 64264454 2023-12-21 10:53:16,943 INFO [train.py:986] (3/4) Using DDP 2023-12-21 10:53:17,373 INFO [at_datamodule.py:398] (3/4) About to get the audioset cuts for KD. 2023-12-21 10:53:17,450 INFO [at_datamodule.py:223] (3/4) Enable MUSAN 2023-12-21 10:53:17,450 INFO [at_datamodule.py:224] (3/4) About to get Musan cuts 2023-12-21 10:53:19,849 INFO [at_datamodule.py:248] (3/4) Enable SpecAugment 2023-12-21 10:53:19,849 INFO [at_datamodule.py:249] (3/4) Time warp factor: 80 2023-12-21 10:53:19,849 INFO [at_datamodule.py:259] (3/4) Num frame mask: 10 2023-12-21 10:53:19,849 INFO [at_datamodule.py:272] (3/4) About to create train dataset 2023-12-21 10:53:19,850 INFO [at_datamodule.py:299] (3/4) Using DynamicBucketingSampler. 2023-12-21 10:53:22,209 INFO [at_datamodule.py:315] (3/4) About to create train dataloader 2023-12-21 10:53:22,210 INFO [at_datamodule.py:410] (3/4) About to get test-other cuts 2023-12-21 10:53:22,212 INFO [at_datamodule.py:346] (3/4) About to create dev dataset 2023-12-21 10:53:22,651 INFO [at_datamodule.py:363] (3/4) About to create dev dataloader 2023-12-21 10:53:49,464 INFO [train.py:886] (3/4) Epoch 1, batch 0, loss[loss=1.851, audio_tagging_loss=1.851, over 24045.00 frames. ], tot_loss[loss=1.851, audio_tagging_loss=1.851, over 24045.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0 2023-12-21 10:53:49,464 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 10:54:14,721 INFO [train.py:917] (3/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames. 2023-12-21 10:54:14,722 INFO [train.py:918] (3/4) Maximum memory allocated so far is 13114MB 2023-12-21 10:54:19,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=0.0, ans=0.5 2023-12-21 10:54:19,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.65 vs. limit=7.5 2023-12-21 10:54:25,383 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+02 8.487e+02 9.999e+02 1.363e+03 1.706e+03, threshold=4.000e+03, percent-clipped=0.0 2023-12-21 10:54:26,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=23.28 vs. limit=7.525 2023-12-21 10:54:28,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=66.66666666666667, ans=0.8976666666666667 2023-12-21 10:54:30,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=86.15 vs. limit=7.525 2023-12-21 10:54:31,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=368.91 vs. limit=5.033333333333333 2023-12-21 10:54:32,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=54.36 vs. limit=4.026666666666666 2023-12-21 10:54:34,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=90.08 vs. limit=5.016666666666667 2023-12-21 10:54:35,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.73 vs. limit=7.55 2023-12-21 10:54:36,994 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.136e+01 2.542e+02 7.819e+02 1.187e+03 1.783e+03, threshold=3.128e+03, percent-clipped=0.0 2023-12-21 10:54:37,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.05 vs. limit=7.6 2023-12-21 10:54:42,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.66 vs. limit=3.02 2023-12-21 10:54:47,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=245.78 vs. limit=7.575 2023-12-21 10:54:59,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=383.71 vs. limit=7.65 2023-12-21 10:55:01,180 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 9.013e+01 2.542e+02 8.019e+02 1.783e+03, threshold=1.017e+03, percent-clipped=0.0 2023-12-21 10:55:02,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=355.00 vs. limit=7.6 2023-12-21 10:55:07,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=266.6666666666667, ans=0.4875 2023-12-21 10:55:07,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=4.1066666666666665 2023-12-21 10:55:13,423 INFO [train.py:886] (3/4) Epoch 1, batch 50, loss[loss=0.03861, audio_tagging_loss=0.03861, over 24003.00 frames. ], tot_loss[loss=0.2918, audio_tagging_loss=0.2918, over 1121987.62 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0 2023-12-21 10:55:20,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=81.65 vs. limit=4.133333333333334 2023-12-21 10:55:21,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=333.3333333333333, ans=0.484375 2023-12-21 10:55:24,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=400.0, ans=0.246 2023-12-21 10:55:27,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=400.0, ans=0.48125 2023-12-21 10:55:28,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=400.0, ans=7.65 2023-12-21 10:55:43,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=466.6666666666667, ans=0.1825 2023-12-21 10:55:43,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=213.89 vs. limit=5.233333333333333 2023-12-21 10:55:44,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=466.6666666666667, ans=0.0895 2023-12-21 10:55:44,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=23.85 vs. limit=4.1866666666666665 2023-12-21 10:55:44,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=4.1866666666666665 2023-12-21 10:55:45,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=73.43 vs. limit=5.116666666666666 2023-12-21 10:55:47,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=23.08 vs. limit=7.675 2023-12-21 10:55:50,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=212.42 vs. limit=7.9 2023-12-21 10:55:50,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=135.84 vs. limit=7.7 2023-12-21 10:55:56,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533.3333333333334, ans=0.475 2023-12-21 10:55:56,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=4.213333333333333 2023-12-21 10:56:02,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=248.50 vs. limit=7.725 2023-12-21 10:56:09,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=407.02 vs. limit=7.725 2023-12-21 10:56:10,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=600.0, ans=0.425 2023-12-21 10:56:15,149 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.804e+01 4.984e+01 1.709e+02 1.783e+03, threshold=9.968e+01, percent-clipped=0.0 2023-12-21 10:56:15,186 INFO [train.py:886] (3/4) Epoch 1, batch 100, loss[loss=0.03614, audio_tagging_loss=0.03614, over 25000.00 frames. ], tot_loss[loss=0.1509, audio_tagging_loss=0.1509, over 1975878.83 frames. ], batch size: 100, lr: 2.70e-02, grad_scale: 4.0 2023-12-21 10:56:24,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=30.83 vs. limit=4.266666666666667 2023-12-21 10:56:25,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=466.74 vs. limit=8.0 2023-12-21 10:56:31,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=733.3333333333334, ans=5.458333333333333 2023-12-21 10:56:32,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=108.26 vs. limit=7.775 2023-12-21 10:56:32,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=404.64 vs. limit=8.05 2023-12-21 10:56:37,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=4.293333333333333 2023-12-21 10:56:39,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=32.11 vs. limit=7.8 2023-12-21 10:56:40,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=486.01 vs. limit=7.8 2023-12-21 10:56:41,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=4.32 2023-12-21 10:56:43,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=237.44 vs. limit=7.8 2023-12-21 10:56:45,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=29.07 vs. limit=7.8 2023-12-21 10:56:47,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=460.41 vs. limit=7.8 2023-12-21 10:56:57,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=33.12 vs. limit=5.216666666666667 2023-12-21 10:57:01,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=50.65 vs. limit=4.173333333333334 2023-12-21 10:57:03,551 WARNING [optim.py:500] (3/4) Scaling gradients by 0.07864928245544434, model_norm_threshold=99.68033599853516 2023-12-21 10:57:03,697 WARNING [optim.py:572] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.166e+05, grad_sumsq=5.753e+08, orig_rms_sq=1.246e-03 2023-12-21 10:57:06,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=201.53 vs. limit=8.2 2023-12-21 10:57:07,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=245.35 vs. limit=8.2 2023-12-21 10:57:08,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=933.3333333333334, ans=0.45625 2023-12-21 10:57:10,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=933.3333333333334, ans=0.45625 2023-12-21 10:57:11,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=446.55 vs. limit=7.85 2023-12-21 10:57:12,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=124.42 vs. limit=8.2 2023-12-21 10:57:14,007 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.287e+01 2023-12-21 10:57:14,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=107.51 vs. limit=7.85 2023-12-21 10:57:16,096 INFO [train.py:886] (3/4) Epoch 1, batch 150, loss[loss=0.03202, audio_tagging_loss=0.03202, over 25000.00 frames. ], tot_loss[loss=0.1026, audio_tagging_loss=0.1026, over 2640076.15 frames. ], batch size: 100, lr: 2.93e-02, grad_scale: 2.0 2023-12-21 10:57:20,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=447.57 vs. limit=7.875 2023-12-21 10:57:21,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=85.10 vs. limit=5.5 2023-12-21 10:57:25,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1000.0, ans=0.453125 2023-12-21 10:57:25,859 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.879e+01 2023-12-21 10:57:29,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1066.6666666666667, ans=0.28933333333333333 2023-12-21 10:57:30,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=171.87 vs. limit=7.9 2023-12-21 10:57:31,310 WARNING [optim.py:500] (3/4) Scaling gradients by 0.0951763167977333, model_norm_threshold=99.68033599853516 2023-12-21 10:57:31,458 WARNING [optim.py:572] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.44, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.792e+05, grad_sumsq=3.739e+08, orig_rms_sq=1.282e-03 2023-12-21 10:57:32,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=48.83 vs. limit=8.3 2023-12-21 10:57:33,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=131.15 vs. limit=7.9 2023-12-21 10:57:37,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=352.46 vs. limit=7.9 2023-12-21 10:57:40,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=341.40 vs. limit=7.925 2023-12-21 10:57:41,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=177.13 vs. limit=7.925 2023-12-21 10:57:49,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=49.06 vs. limit=7.925 2023-12-21 10:57:51,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=191.27 vs. limit=7.925 2023-12-21 10:57:52,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=113.22 vs. limit=5.6 2023-12-21 10:57:54,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.98 vs. limit=3.18 2023-12-21 10:57:55,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1200.0, ans=0.0925 2023-12-21 10:57:59,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=31.87 vs. limit=7.95 2023-12-21 10:58:06,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=350.45 vs. limit=7.975 2023-12-21 10:58:11,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1266.6666666666667, ans=0.07150000000000001 2023-12-21 10:58:14,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=355.20 vs. limit=7.975 2023-12-21 10:58:17,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1333.3333333333333, ans=0.09166666666666667 2023-12-21 10:58:18,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=4.533333333333333 2023-12-21 10:58:19,088 INFO [train.py:886] (3/4) Epoch 1, batch 200, loss[loss=0.02658, audio_tagging_loss=0.02658, over 25000.00 frames. ], tot_loss[loss=0.07829, audio_tagging_loss=0.07829, over 3157577.39 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 4.0 2023-12-21 10:58:20,186 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.593e+01 2.983e+01 3.603e+01 1.267e+03, threshold=5.966e+01, percent-clipped=10.0 2023-12-21 10:58:29,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1333.3333333333333, ans=0.15 2023-12-21 10:58:29,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=368.87 vs. limit=8.0 2023-12-21 10:58:29,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=26.19 vs. limit=8.0 2023-12-21 10:58:30,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1400.0, ans=0.434375 2023-12-21 10:58:42,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1400.0, ans=0.09125 2023-12-21 10:58:46,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1466.6666666666667, ans=0.43125 2023-12-21 10:58:49,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1466.6666666666667, ans=5.366666666666667 2023-12-21 10:58:52,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1466.6666666666667, ans=0.8486666666666667 2023-12-21 10:59:01,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=63.86 vs. limit=8.075 2023-12-21 10:59:02,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=479.97 vs. limit=8.075 2023-12-21 10:59:06,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=91.04 vs. limit=8.075 2023-12-21 10:59:12,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=67.52 vs. limit=5.8 2023-12-21 10:59:13,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=93.68 vs. limit=8.1 2023-12-21 10:59:22,004 INFO [train.py:886] (3/4) Epoch 1, batch 250, loss[loss=0.03377, audio_tagging_loss=0.03377, over 25000.00 frames. ], tot_loss[loss=0.06396, audio_tagging_loss=0.06396, over 3551934.55 frames. ], batch size: 100, lr: 3.38e-02, grad_scale: 2.0 2023-12-21 10:59:26,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=4.666666666666667 2023-12-21 10:59:28,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1666.6666666666667, ans=0.7666666666666666 2023-12-21 10:59:34,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=144.89 vs. limit=8.8 2023-12-21 10:59:43,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=50.94 vs. limit=5.866666666666666 2023-12-21 10:59:43,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=174.87 vs. limit=8.8 2023-12-21 10:59:45,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1800.0, ans=0.1325 2023-12-21 10:59:45,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1800.0, ans=0.768 2023-12-21 10:59:48,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=4.72 2023-12-21 11:00:03,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125 2023-12-21 11:00:05,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1866.6666666666667, ans=0.4125 2023-12-21 11:00:10,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=47.47 vs. limit=8.225 2023-12-21 11:00:11,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=4.773333333333333 2023-12-21 11:00:19,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=4.773333333333333 2023-12-21 11:00:19,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=30.28 vs. limit=8.95 2023-12-21 11:00:20,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1933.3333333333333, ans=0.2806666666666667 2023-12-21 11:00:23,434 INFO [train.py:886] (3/4) Epoch 1, batch 300, loss[loss=0.03009, audio_tagging_loss=0.03009, over 24750.00 frames. ], tot_loss[loss=0.0548, audio_tagging_loss=0.0548, over 3849720.49 frames. ], batch size: 99, lr: 3.60e-02, grad_scale: 4.0 2023-12-21 11:00:25,740 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.506e+01 2.969e+01 4.379e+01 2.139e+02, threshold=5.939e+01, percent-clipped=11.0 2023-12-21 11:00:25,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2000.0, ans=0.04949747468305833 2023-12-21 11:00:38,363 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=5.261e+00 2023-12-21 11:00:44,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.62 vs. limit=9.05 2023-12-21 11:00:54,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2133.3333333333335, ans=0.12 2023-12-21 11:01:02,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2200.0, ans=0.050499999999999996 2023-12-21 11:01:03,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2200.0, ans=0.396875 2023-12-21 11:01:06,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.58 vs. limit=6.1 2023-12-21 11:01:12,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=34.81 vs. limit=9.15 2023-12-21 11:01:18,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=8.35 2023-12-21 11:01:20,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=42.81 vs. limit=8.35 2023-12-21 11:01:25,855 INFO [train.py:886] (3/4) Epoch 1, batch 350, loss[loss=0.02616, audio_tagging_loss=0.02616, over 24750.00 frames. ], tot_loss[loss=0.0482, audio_tagging_loss=0.0482, over 4088818.45 frames. ], batch size: 99, lr: 3.83e-02, grad_scale: 4.0 2023-12-21 11:01:27,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=4.933333333333334 2023-12-21 11:01:30,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=17.65 vs. limit=4.933333333333334 2023-12-21 11:01:30,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=130.04 vs. limit=8.375 2023-12-21 11:01:31,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=43.50 vs. limit=6.166666666666667 2023-12-21 11:01:31,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2333.3333333333335, ans=0.27666666666666667 2023-12-21 11:01:38,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=96.83 vs. limit=8.4 2023-12-21 11:01:41,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=74.92 vs. limit=9.3 2023-12-21 11:01:42,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=50.07 vs. limit=8.4 2023-12-21 11:01:59,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=34.33 vs. limit=9.35 2023-12-21 11:02:04,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=5.633333333333334 2023-12-21 11:02:07,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.05 vs. limit=5.633333333333334 2023-12-21 11:02:15,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=112.03 vs. limit=8.475 2023-12-21 11:02:15,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=134.86 vs. limit=8.475 2023-12-21 11:02:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2600.0, ans=0.27399999999999997 2023-12-21 11:02:18,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2600.0, ans=0.378125 2023-12-21 11:02:20,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.47 vs. limit=9.45 2023-12-21 11:02:23,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=25.16 vs. limit=8.475 2023-12-21 11:02:28,560 INFO [train.py:886] (3/4) Epoch 1, batch 400, loss[loss=0.02577, audio_tagging_loss=0.02577, over 24750.00 frames. ], tot_loss[loss=0.0429, audio_tagging_loss=0.0429, over 4275402.16 frames. ], batch size: 99, lr: 4.05e-02, grad_scale: 8.0 2023-12-21 11:02:29,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.36 vs. limit=6.333333333333333 2023-12-21 11:02:29,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2666.6666666666665, ans=0.375 2023-12-21 11:02:30,850 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.972e+01 3.451e+01 4.422e+01 2.511e+02, threshold=6.902e+01, percent-clipped=7.0 2023-12-21 11:02:33,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=5.066666666666666 2023-12-21 11:02:41,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.42 vs. limit=6.366666666666667 2023-12-21 11:02:46,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=35.77 vs. limit=8.525 2023-12-21 11:03:03,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=5.12 2023-12-21 11:03:07,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=62.17 vs. limit=9.65 2023-12-21 11:03:09,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.56 vs. limit=9.65 2023-12-21 11:03:11,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2866.6666666666665, ans=0.07 2023-12-21 11:03:15,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=195.37 vs. limit=8.575 2023-12-21 11:03:21,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2933.3333333333335, ans=0.3625 2023-12-21 11:03:23,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=55.28 vs. limit=9.7 2023-12-21 11:03:29,117 INFO [train.py:886] (3/4) Epoch 1, batch 450, loss[loss=0.02195, audio_tagging_loss=0.02195, over 25000.00 frames. ], tot_loss[loss=0.03877, audio_tagging_loss=0.03877, over 4432025.17 frames. ], batch size: 100, lr: 4.28e-02, grad_scale: 8.0 2023-12-21 11:03:31,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=5.75 2023-12-21 11:03:31,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=9.75 2023-12-21 11:03:40,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.56 vs. limit=9.75 2023-12-21 11:03:41,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3066.6666666666665, ans=0.35625 2023-12-21 11:03:41,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=9.8 2023-12-21 11:03:56,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=8.675 2023-12-21 11:03:57,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=22.46 vs. limit=6.566666666666666 2023-12-21 11:03:58,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3133.3333333333335, ans=0.353125 2023-12-21 11:04:04,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.39 vs. limit=9.9 2023-12-21 11:04:06,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3200.0, ans=7.0 2023-12-21 11:04:12,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=5.8 2023-12-21 11:04:19,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=47.59 vs. limit=8.725 2023-12-21 11:04:24,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=5.816666666666666 2023-12-21 11:04:25,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=23.44 vs. limit=6.633333333333333 2023-12-21 11:04:27,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=76.87 vs. limit=8.725 2023-12-21 11:04:29,729 INFO [train.py:886] (3/4) Epoch 1, batch 500, loss[loss=0.02346, audio_tagging_loss=0.02346, over 25000.00 frames. ], tot_loss[loss=0.03571, audio_tagging_loss=0.03571, over 4545473.18 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:04:31,971 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.916e+01 3.389e+01 4.125e+01 8.969e+01, threshold=6.779e+01, percent-clipped=3.0 2023-12-21 11:04:34,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3333.3333333333335, ans=0.34375 2023-12-21 11:04:40,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=5.85 2023-12-21 11:04:41,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=10.05 2023-12-21 11:04:45,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3400.0, ans=0.340625 2023-12-21 11:04:45,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3400.0, ans=0.340625 2023-12-21 11:04:48,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.37 vs. limit=6.7 2023-12-21 11:04:52,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3400.0, ans=0.340625 2023-12-21 11:04:55,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=65.10 vs. limit=8.8 2023-12-21 11:05:15,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3533.3333333333335, ans=0.06749999999999998 2023-12-21 11:05:16,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=5.4399999999999995 2023-12-21 11:05:31,201 INFO [train.py:886] (3/4) Epoch 1, batch 550, loss[loss=0.03238, audio_tagging_loss=0.03238, over 25000.00 frames. ], tot_loss[loss=0.03332, audio_tagging_loss=0.03332, over 4641485.33 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:05:38,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=10.25 2023-12-21 11:05:51,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3733.3333333333335, ans=0.033333333333333326 2023-12-21 11:05:56,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=62.36 vs. limit=8.925 2023-12-21 11:05:57,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=8.925 2023-12-21 11:05:57,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=8.925 2023-12-21 11:06:00,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=8.925 2023-12-21 11:06:04,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3800.0, ans=0.321875 2023-12-21 11:06:21,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.47 vs. limit=10.45 2023-12-21 11:06:29,531 INFO [train.py:886] (3/4) Epoch 1, batch 600, loss[loss=0.02817, audio_tagging_loss=0.02817, over 24750.00 frames. ], tot_loss[loss=0.03184, audio_tagging_loss=0.03184, over 4702789.38 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:06:31,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=9.0 2023-12-21 11:06:31,737 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.660e+01 5.036e+01 7.229e+01 1.228e+02, threshold=1.007e+02, percent-clipped=27.0 2023-12-21 11:06:32,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=65.08 vs. limit=9.0 2023-12-21 11:06:34,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=9.0 2023-12-21 11:06:34,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=7.0 2023-12-21 11:06:36,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4000.0, ans=0.26 2023-12-21 11:06:38,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=80.50 vs. limit=9.0 2023-12-21 11:06:48,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=46.07 vs. limit=10.55 2023-12-21 11:06:57,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=10.6 2023-12-21 11:07:02,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=66.65 vs. limit=9.05 2023-12-21 11:07:06,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4200.0, ans=0.04916666666666667 2023-12-21 11:07:07,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=54.99 vs. limit=9.075 2023-12-21 11:07:12,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4200.0, ans=0.753 2023-12-21 11:07:13,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4200.0, ans=9.075 2023-12-21 11:07:18,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=9.1 2023-12-21 11:07:27,738 INFO [train.py:886] (3/4) Epoch 1, batch 650, loss[loss=0.02279, audio_tagging_loss=0.02279, over 24750.00 frames. ], tot_loss[loss=0.03052, audio_tagging_loss=0.03052, over 4755897.94 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:07:35,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=56.99 vs. limit=9.125 2023-12-21 11:07:53,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=9.175 2023-12-21 11:07:56,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4466.666666666667, ans=0.2553333333333333 2023-12-21 11:07:58,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4466.666666666667, ans=0.04805555555555556 2023-12-21 11:07:59,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=6.116666666666667 2023-12-21 11:08:01,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.88 vs. limit=10.85 2023-12-21 11:08:01,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4533.333333333333, ans=0.04777777777777778 2023-12-21 11:08:07,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=48.33 vs. limit=9.2 2023-12-21 11:08:12,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4600.0, ans=0.269 2023-12-21 11:08:14,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4600.0, ans=0.009869565217391305 2023-12-21 11:08:26,815 INFO [train.py:886] (3/4) Epoch 1, batch 700, loss[loss=0.0206, audio_tagging_loss=0.0206, over 24750.00 frames. ], tot_loss[loss=0.0293, audio_tagging_loss=0.0293, over 4793614.62 frames. ], batch size: 99, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:08:28,959 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 4.940e+01 6.035e+01 7.817e+01 1.849e+02, threshold=1.207e+02, percent-clipped=12.0 2023-12-21 11:08:31,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=5.866666666666667 2023-12-21 11:08:33,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.23 vs. limit=7.333333333333334 2023-12-21 11:08:35,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.56 vs. limit=11.0 2023-12-21 11:08:41,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=31.45 vs. limit=9.275 2023-12-21 11:08:41,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.60 vs. limit=7.366666666666666 2023-12-21 11:08:44,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4733.333333333333, ans=0.04694444444444445 2023-12-21 11:08:46,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4733.333333333333, ans=0.009840579710144928 2023-12-21 11:08:47,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4800.0, ans=0.272 2023-12-21 11:08:50,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4800.0, ans=11.1 2023-12-21 11:08:59,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=5.92 2023-12-21 11:09:07,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=48.62 vs. limit=9.325 2023-12-21 11:09:09,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=41.08 vs. limit=9.325 2023-12-21 11:09:17,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=3.74 2023-12-21 11:09:20,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=9.35 2023-12-21 11:09:20,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=11.2 2023-12-21 11:09:22,046 INFO [train.py:886] (3/4) Epoch 1, batch 750, loss[loss=0.02382, audio_tagging_loss=0.02382, over 25000.00 frames. ], tot_loss[loss=0.02814, audio_tagging_loss=0.02814, over 4830941.27 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2023-12-21 11:09:38,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=53.69 vs. limit=9.4 2023-12-21 11:09:40,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=6.266666666666667 2023-12-21 11:09:41,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=9.4 2023-12-21 11:09:42,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5066.666666666667, ans=0.8006666666666666 2023-12-21 11:09:49,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5133.333333333333, ans=0.07 2023-12-21 11:09:59,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5200.0, ans=0.045000000000000005 2023-12-21 11:10:12,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=47.22 vs. limit=9.475 2023-12-21 11:10:19,873 INFO [train.py:886] (3/4) Epoch 1, batch 800, loss[loss=0.0193, audio_tagging_loss=0.0193, over 25000.00 frames. ], tot_loss[loss=0.02716, audio_tagging_loss=0.02716, over 4858645.49 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:10:22,010 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 3.313e+01 4.173e+01 5.276e+01 1.022e+02, threshold=8.346e+01, percent-clipped=0.0 2023-12-21 11:10:23,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=9.5 2023-12-21 11:10:26,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=66.63 vs. limit=9.5 2023-12-21 11:10:27,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5333.333333333333, ans=0.24666666666666667 2023-12-21 11:10:29,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5400.0, ans=0.246875 2023-12-21 11:10:30,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=42.22 vs. limit=7.7 2023-12-21 11:10:32,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5400.0, ans=0.09899494936611666 2023-12-21 11:10:33,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=9.525 2023-12-21 11:10:38,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=11.55 2023-12-21 11:10:38,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5400.0, ans=0.246875 2023-12-21 11:10:49,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5466.666666666667, ans=0.24375000000000002 2023-12-21 11:10:53,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=48.88 vs. limit=9.575 2023-12-21 11:11:12,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=9.6 2023-12-21 11:11:14,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=9.6 2023-12-21 11:11:15,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=5666.666666666667, ans=11.75 2023-12-21 11:11:16,878 INFO [train.py:886] (3/4) Epoch 1, batch 850, loss[loss=0.02434, audio_tagging_loss=0.02434, over 25000.00 frames. ], tot_loss[loss=0.02651, audio_tagging_loss=0.02651, over 4885131.85 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 16.0 2023-12-21 11:11:24,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5666.666666666667, ans=0.234375 2023-12-21 11:11:24,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5666.666666666667, ans=0.7016666666666667 2023-12-21 11:11:25,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5666.666666666667, ans=8.541666666666668 2023-12-21 11:11:28,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=9.65 2023-12-21 11:11:29,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=9.65 2023-12-21 11:11:46,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=9.675 2023-12-21 11:11:49,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.70 vs. limit=6.466666666666667 2023-12-21 11:11:50,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=24.80 vs. limit=9.7 2023-12-21 11:12:05,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=31.37 vs. limit=9.725 2023-12-21 11:12:12,827 INFO [train.py:886] (3/4) Epoch 1, batch 900, loss[loss=0.02176, audio_tagging_loss=0.02176, over 25000.00 frames. ], tot_loss[loss=0.02606, audio_tagging_loss=0.02606, over 4903196.39 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:12:14,799 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 3.238e+01 4.010e+01 4.970e+01 2.854e+02, threshold=8.021e+01, percent-clipped=5.0 2023-12-21 11:12:23,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=45.64 vs. limit=12.05 2023-12-21 11:12:23,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=29.56 vs. limit=9.775 2023-12-21 11:12:26,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=33.91 vs. limit=12.05 2023-12-21 11:12:27,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6066.666666666667, ans=0.23933333333333334 2023-12-21 11:12:28,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6066.666666666667, ans=0.215625 2023-12-21 11:12:31,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=6066.666666666667, ans=0.215625 2023-12-21 11:12:41,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=6133.333333333333, ans=0.009536231884057972 2023-12-21 11:12:56,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=6.55 2023-12-21 11:13:09,788 INFO [train.py:886] (3/4) Epoch 1, batch 950, loss[loss=0.02227, audio_tagging_loss=0.02227, over 24750.00 frames. ], tot_loss[loss=0.02584, audio_tagging_loss=0.02584, over 4901029.46 frames. ], batch size: 99, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:13:22,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=8.2 2023-12-21 11:13:26,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=6400.0, ans=0.29600000000000004 2023-12-21 11:13:33,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=23.81 vs. limit=9.925 2023-12-21 11:13:34,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=6466.666666666667, ans=0.6736666666666666 2023-12-21 11:13:45,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=12.4 2023-12-21 11:13:46,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=6533.333333333333, ans=0.6713333333333333 2023-12-21 11:13:57,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=9.975 2023-12-21 11:13:59,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.32 vs. limit=8.3 2023-12-21 11:14:04,974 INFO [train.py:886] (3/4) Epoch 1, batch 1000, loss[loss=0.02352, audio_tagging_loss=0.02352, over 25000.00 frames. ], tot_loss[loss=0.02526, audio_tagging_loss=0.02526, over 4911235.50 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:14:07,688 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.858e+01 3.323e+01 3.988e+01 7.077e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-21 11:14:09,952 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.576e+01 2023-12-21 11:14:15,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=6733.333333333333, ans=0.184375 2023-12-21 11:14:22,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.69 vs. limit=8.366666666666667 2023-12-21 11:14:23,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=10.025 2023-12-21 11:14:28,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=24.85 vs. limit=10.05 2023-12-21 11:14:31,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=6800.0, ans=0.009391304347826087 2023-12-21 11:14:34,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=6800.0, ans=0.18125000000000002 2023-12-21 11:14:35,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=10.05 2023-12-21 11:14:44,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6866.666666666667, ans=0.23133333333333334 2023-12-21 11:14:45,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=6866.666666666667, ans=10.075 2023-12-21 11:14:49,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=6.773333333333333 2023-12-21 11:14:55,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=10.1 2023-12-21 11:14:55,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=6933.333333333333, ans=0.1 2023-12-21 11:14:59,653 INFO [train.py:886] (3/4) Epoch 1, batch 1050, loss[loss=0.02024, audio_tagging_loss=0.02024, over 25000.00 frames. ], tot_loss[loss=0.02475, audio_tagging_loss=0.02475, over 4921764.21 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:07,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=7000.0, ans=0.655 2023-12-21 11:15:11,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=12.8 2023-12-21 11:15:14,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=7066.666666666667, ans=0.306 2023-12-21 11:15:20,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=10.15 2023-12-21 11:15:21,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=7133.333333333333, ans=0.16562500000000002 2023-12-21 11:15:28,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=34.31 vs. limit=10.175 2023-12-21 11:15:33,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=7200.0, ans=0.648 2023-12-21 11:15:43,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=76.45 vs. limit=10.225 2023-12-21 11:15:43,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.59 vs. limit=8.633333333333333 2023-12-21 11:15:44,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=12.95 2023-12-21 11:15:54,677 INFO [train.py:886] (3/4) Epoch 1, batch 1100, loss[loss=0.02216, audio_tagging_loss=0.02216, over 25000.00 frames. ], tot_loss[loss=0.02422, audio_tagging_loss=0.02422, over 4930202.75 frames. ], batch size: 100, lr: 4.48e-02, grad_scale: 16.0 2023-12-21 11:15:56,645 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.590e+01 3.009e+01 3.352e+01 1.810e+02, threshold=6.019e+01, percent-clipped=1.0 2023-12-21 11:15:58,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.60 vs. limit=10.25 2023-12-21 11:16:01,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=13.0 2023-12-21 11:16:14,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=7466.666666666667, ans=0.6386666666666667 2023-12-21 11:16:16,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=7466.666666666667, ans=0.04949747468305833 2023-12-21 11:16:17,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.36 vs. limit=6.866666666666667 2023-12-21 11:16:24,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=10.3 2023-12-21 11:16:27,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=64.73 vs. limit=10.325 2023-12-21 11:16:29,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7533.333333333333, ans=0.14687499999999998 2023-12-21 11:16:33,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=13.15 2023-12-21 11:16:43,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=7600.0, ans=0.224 2023-12-21 11:16:47,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=10.375 2023-12-21 11:16:48,209 INFO [train.py:886] (3/4) Epoch 1, batch 1150, loss[loss=0.02724, audio_tagging_loss=0.02724, over 25000.00 frames. ], tot_loss[loss=0.02385, audio_tagging_loss=0.02385, over 4939185.26 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 16.0 2023-12-21 11:16:56,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=7666.666666666667, ans=0.034722222222222224 2023-12-21 11:16:56,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=4.15 2023-12-21 11:16:57,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=7666.666666666667, ans=0.22333333333333333 2023-12-21 11:17:01,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.06 vs. limit=8.866666666666667 2023-12-21 11:17:04,380 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.339e+00 2023-12-21 11:17:10,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=7800.0, ans=0.172 2023-12-21 11:17:24,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=7866.666666666667, ans=0.13124999999999998 2023-12-21 11:17:43,372 INFO [train.py:886] (3/4) Epoch 1, batch 1200, loss[loss=0.01948, audio_tagging_loss=0.01948, over 25000.00 frames. ], tot_loss[loss=0.02357, audio_tagging_loss=0.02357, over 4946944.87 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:17:45,262 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.423e+01 2.660e+01 3.185e+01 5.087e+01, threshold=5.319e+01, percent-clipped=0.0 2023-12-21 11:17:45,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=13.5 2023-12-21 11:17:54,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=8066.666666666667, ans=0.125 2023-12-21 11:17:59,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=8066.666666666667, ans=0.125 2023-12-21 11:17:59,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=13.55 2023-12-21 11:18:14,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.575 2023-12-21 11:18:17,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=10.575 2023-12-21 11:18:19,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=7.05 2023-12-21 11:18:22,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=13.65 2023-12-21 11:18:35,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=10.6 2023-12-21 11:18:37,382 INFO [train.py:886] (3/4) Epoch 1, batch 1250, loss[loss=0.02415, audio_tagging_loss=0.02415, over 24750.00 frames. ], tot_loss[loss=0.02339, audio_tagging_loss=0.02339, over 4939679.14 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:18:39,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=10.625 2023-12-21 11:18:54,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=13.8 2023-12-21 11:18:55,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8400.0, ans=0.216 2023-12-21 11:19:00,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8466.666666666666, ans=0.125 2023-12-21 11:19:02,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=10.675 2023-12-21 11:19:03,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=10.675 2023-12-21 11:19:04,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.37 vs. limit=7.116666666666666 2023-12-21 11:19:08,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.48 vs. limit=9.233333333333333 2023-12-21 11:19:16,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.94 vs. limit=9.266666666666667 2023-12-21 11:19:25,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=13.95 2023-12-21 11:19:30,113 INFO [train.py:886] (3/4) Epoch 1, batch 1300, loss[loss=0.0209, audio_tagging_loss=0.0209, over 25000.00 frames. ], tot_loss[loss=0.02319, audio_tagging_loss=0.02319, over 4941844.43 frames. ], batch size: 100, lr: 4.47e-02, grad_scale: 32.0 2023-12-21 11:19:32,766 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.585e+01 3.133e+01 4.200e+01, threshold=5.169e+01, percent-clipped=0.0 2023-12-21 11:19:39,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=8666.666666666666, ans=0.03055555555555556 2023-12-21 11:19:40,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=8666.666666666666, ans=0.125 2023-12-21 11:19:44,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=8733.333333333334, ans=0.125 2023-12-21 11:19:53,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=10.8 2023-12-21 11:19:54,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.84 vs. limit=7.2 2023-12-21 11:20:03,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.38 vs. limit=14.15 2023-12-21 11:20:06,119 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.413e+00 2023-12-21 11:20:10,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=10.825 2023-12-21 11:20:11,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=14.15 2023-12-21 11:20:13,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=8933.333333333334, ans=0.125 2023-12-21 11:20:14,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=10.85 2023-12-21 11:20:15,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=10.85 2023-12-21 11:20:21,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=10.85 2023-12-21 11:20:22,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=8933.333333333334, ans=0.035 2023-12-21 11:20:24,099 INFO [train.py:886] (3/4) Epoch 1, batch 1350, loss[loss=0.02239, audio_tagging_loss=0.02239, over 24750.00 frames. ], tot_loss[loss=0.02286, audio_tagging_loss=0.02286, over 4943630.20 frames. ], batch size: 99, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:20:27,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=9000.0, ans=0.125 2023-12-21 11:20:34,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=9066.666666666666, ans=0.125 2023-12-21 11:20:36,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=14.3 2023-12-21 11:20:36,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=14.3 2023-12-21 11:20:44,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.56 vs. limit=14.3 2023-12-21 11:20:47,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=10.925 2023-12-21 11:20:50,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=9133.333333333334, ans=0.025 2023-12-21 11:20:55,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=9200.0, ans=0.5780000000000001 2023-12-21 11:21:01,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=10.95 2023-12-21 11:21:04,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=9200.0, ans=0.028333333333333335 2023-12-21 11:21:07,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=10.975 2023-12-21 11:21:18,819 INFO [train.py:886] (3/4) Epoch 1, batch 1400, loss[loss=0.01852, audio_tagging_loss=0.01852, over 25000.00 frames. ], tot_loss[loss=0.02255, audio_tagging_loss=0.02255, over 4945038.53 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:21:19,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:19,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=14.5 2023-12-21 11:21:20,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=9333.333333333334, ans=0.008840579710144927 2023-12-21 11:21:20,784 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.157e+01 2.468e+01 2.846e+01 4.252e+01, threshold=4.936e+01, percent-clipped=0.0 2023-12-21 11:21:21,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=9333.333333333334, ans=0.125 2023-12-21 11:21:29,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=9400.0, ans=0.05 2023-12-21 11:21:35,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=9400.0, ans=0.125 2023-12-21 11:21:37,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=11.025 2023-12-21 11:21:38,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=27.24 vs. limit=11.05 2023-12-21 11:21:42,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9466.666666666666, ans=0.125 2023-12-21 11:22:04,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=9600.0, ans=0.5640000000000001 2023-12-21 11:22:11,312 INFO [train.py:886] (3/4) Epoch 1, batch 1450, loss[loss=0.02006, audio_tagging_loss=0.02006, over 25000.00 frames. ], tot_loss[loss=0.02222, audio_tagging_loss=0.02222, over 4951685.26 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:22:20,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=11.125 2023-12-21 11:22:28,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=14.8 2023-12-21 11:22:30,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=9733.333333333334, ans=0.20266666666666666 2023-12-21 11:22:30,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=14.8 2023-12-21 11:22:46,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=9866.666666666666, ans=11.2 2023-12-21 11:22:49,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=9866.666666666666, ans=0.5546666666666666 2023-12-21 11:22:59,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=9933.333333333334, ans=14.95 2023-12-21 11:23:00,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=9933.333333333334, ans=0.07 2023-12-21 11:23:04,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=11.25 2023-12-21 11:23:04,956 INFO [train.py:886] (3/4) Epoch 1, batch 1500, loss[loss=0.01768, audio_tagging_loss=0.01768, over 22100.00 frames. ], tot_loss[loss=0.02215, audio_tagging_loss=0.02215, over 4953352.80 frames. ], batch size: 107, lr: 4.46e-02, grad_scale: 32.0 2023-12-21 11:23:06,870 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+01 2.171e+01 2.518e+01 2.989e+01 4.492e+01, threshold=5.036e+01, percent-clipped=0.0 2023-12-21 11:23:09,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=10000.0, ans=15.0 2023-12-21 11:23:09,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=11.25 2023-12-21 11:23:11,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.70 vs. limit=15.0 2023-12-21 11:23:11,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10000.0, ans=0.125 2023-12-21 11:23:13,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=10066.666666666666, ans=0.5476666666666667 2023-12-21 11:23:23,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=7.516666666666667 2023-12-21 11:23:30,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=10133.333333333334, ans=0.024444444444444446 2023-12-21 11:23:35,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=11.3 2023-12-21 11:23:48,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=10266.666666666666, ans=0.5406666666666667 2023-12-21 11:23:48,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=8.106666666666666 2023-12-21 11:23:50,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=10266.666666666666, ans=0.125 2023-12-21 11:23:55,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=10266.666666666666, ans=0.5406666666666667 2023-12-21 11:23:57,369 INFO [train.py:886] (3/4) Epoch 1, batch 1550, loss[loss=0.02113, audio_tagging_loss=0.02113, over 24750.00 frames. ], tot_loss[loss=0.02226, audio_tagging_loss=0.02226, over 4948292.62 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:23:59,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=15.25 2023-12-21 11:24:08,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10400.0, ans=0.196 2023-12-21 11:24:11,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=7.6 2023-12-21 11:24:12,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=11.4 2023-12-21 11:24:16,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=4.5600000000000005 2023-12-21 11:24:32,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=11.45 2023-12-21 11:24:35,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=10533.333333333334, ans=0.09899494936611666 2023-12-21 11:24:46,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=11.475 2023-12-21 11:24:46,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=10600.0, ans=0.022500000000000003 2023-12-21 11:24:49,628 INFO [train.py:886] (3/4) Epoch 1, batch 1600, loss[loss=0.02183, audio_tagging_loss=0.02183, over 24750.00 frames. ], tot_loss[loss=0.02222, audio_tagging_loss=0.02222, over 4945982.72 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:24:50,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=10666.666666666666, ans=0.125 2023-12-21 11:24:50,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=10666.666666666666, ans=0.0 2023-12-21 11:24:51,541 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.281e+01 2.645e+01 2.930e+01 5.191e+01, threshold=5.289e+01, percent-clipped=1.0 2023-12-21 11:25:09,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=10733.333333333334, ans=0.125 2023-12-21 11:25:14,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=10800.0, ans=0.04949747468305833 2023-12-21 11:25:14,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=11.55 2023-12-21 11:25:35,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.7 2023-12-21 11:25:40,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=8.373333333333335 2023-12-21 11:25:42,714 INFO [train.py:886] (3/4) Epoch 1, batch 1650, loss[loss=0.01866, audio_tagging_loss=0.01866, over 24750.00 frames. ], tot_loss[loss=0.02205, audio_tagging_loss=0.02205, over 4948371.93 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 32.0 2023-12-21 11:25:45,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=7.75 2023-12-21 11:25:52,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=11066.666666666666, ans=11.65 2023-12-21 11:26:02,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=11133.333333333334, ans=0.020277777777777773 2023-12-21 11:26:10,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=15.85 2023-12-21 11:26:15,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=11.7 2023-12-21 11:26:16,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=11200.0, ans=0.508 2023-12-21 11:26:29,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=11266.666666666666, ans=0.5056666666666667 2023-12-21 11:26:31,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=11.725 2023-12-21 11:26:33,625 INFO [train.py:886] (3/4) Epoch 1, batch 1700, loss[loss=0.02161, audio_tagging_loss=0.02161, over 24750.00 frames. ], tot_loss[loss=0.02174, audio_tagging_loss=0.02174, over 4945776.83 frames. ], batch size: 99, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:26:37,011 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.242e+01 2.541e+01 2.981e+01 4.448e+01, threshold=5.082e+01, percent-clipped=0.0 2023-12-21 11:26:40,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=11333.333333333334, ans=0.01944444444444444 2023-12-21 11:26:42,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=16.0 2023-12-21 11:26:51,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=11.775 2023-12-21 11:26:52,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=11400.0, ans=0.008391304347826088 2023-12-21 11:27:06,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=11533.333333333334, ans=0.00836231884057971 2023-12-21 11:27:24,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=11.85 2023-12-21 11:27:24,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=11600.0, ans=0.018333333333333333 2023-12-21 11:27:25,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=11600.0, ans=16.2 2023-12-21 11:27:26,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.97 vs. limit=16.25 2023-12-21 11:27:26,577 INFO [train.py:886] (3/4) Epoch 1, batch 1750, loss[loss=0.02517, audio_tagging_loss=0.02517, over 22361.00 frames. ], tot_loss[loss=0.02154, audio_tagging_loss=0.02154, over 4943424.28 frames. ], batch size: 107, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:27:53,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11800.0, ans=0.0175 2023-12-21 11:27:54,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=37.04 vs. limit=11.925 2023-12-21 11:27:54,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=16.35 2023-12-21 11:27:56,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=11800.0, ans=0.48700000000000004 2023-12-21 11:28:00,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=11866.666666666666, ans=11.95 2023-12-21 11:28:06,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11933.333333333334, ans=0.18066666666666667 2023-12-21 11:28:16,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=11.975 2023-12-21 11:28:17,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=16.45 2023-12-21 11:28:18,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=4.8 2023-12-21 11:28:19,022 INFO [train.py:886] (3/4) Epoch 1, batch 1800, loss[loss=0.02095, audio_tagging_loss=0.02095, over 25000.00 frames. ], tot_loss[loss=0.02135, audio_tagging_loss=0.02135, over 4940609.05 frames. ], batch size: 100, lr: 4.44e-02, grad_scale: 32.0 2023-12-21 11:28:20,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.137e+01 2.486e+01 2.786e+01 3.987e+01, threshold=4.972e+01, percent-clipped=0.0 2023-12-21 11:28:21,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=12000.0, ans=0.00826086956521739 2023-12-21 11:28:27,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=12000.0, ans=0.48000000000000004 2023-12-21 11:28:28,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=12.025 2023-12-21 11:28:34,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=36.31 vs. limit=12.025 2023-12-21 11:28:39,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=16.6 2023-12-21 11:28:49,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=12200.0, ans=0.125 2023-12-21 11:29:08,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=4.84 2023-12-21 11:29:10,295 INFO [train.py:886] (3/4) Epoch 1, batch 1850, loss[loss=0.0202, audio_tagging_loss=0.0202, over 25000.00 frames. ], tot_loss[loss=0.02147, audio_tagging_loss=0.02147, over 4942562.33 frames. ], batch size: 100, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:29:14,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=8.933333333333334 2023-12-21 11:29:17,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=12.125 2023-12-21 11:29:23,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.98 vs. limit=12.15 2023-12-21 11:29:23,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=12.15 2023-12-21 11:29:37,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=16.85 2023-12-21 11:29:42,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12533.333333333334, ans=0.17466666666666666 2023-12-21 11:29:48,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=16.9 2023-12-21 11:29:50,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12533.333333333334, ans=0.125 2023-12-21 11:30:01,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=12600.0, ans=0.125 2023-12-21 11:30:03,669 INFO [train.py:886] (3/4) Epoch 1, batch 1900, loss[loss=0.01897, audio_tagging_loss=0.01897, over 24750.00 frames. ], tot_loss[loss=0.02154, audio_tagging_loss=0.02154, over 4945396.80 frames. ], batch size: 99, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:30:05,588 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.185e+01 2.601e+01 3.008e+01 6.428e+01, threshold=5.202e+01, percent-clipped=3.0 2023-12-21 11:30:12,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.75 vs. limit=17.0 2023-12-21 11:30:16,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=8.183333333333334 2023-12-21 11:30:18,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=12733.333333333334, ans=0.008101449275362318 2023-12-21 11:30:26,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=12.3 2023-12-21 11:30:30,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.56 vs. limit=17.1 2023-12-21 11:30:33,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=12800.0, ans=0.172 2023-12-21 11:30:34,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=12.3 2023-12-21 11:30:38,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.67 vs. limit=17.15 2023-12-21 11:30:41,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=12.325 2023-12-21 11:30:57,338 INFO [train.py:886] (3/4) Epoch 1, batch 1950, loss[loss=0.02478, audio_tagging_loss=0.02478, over 22701.00 frames. ], tot_loss[loss=0.02136, audio_tagging_loss=0.02136, over 4932320.62 frames. ], batch size: 107, lr: 4.43e-02, grad_scale: 32.0 2023-12-21 11:30:57,507 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.258e+00 2023-12-21 11:31:02,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=12.375 2023-12-21 11:31:03,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=13000.0, ans=0.125 2023-12-21 11:31:05,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=13000.0, ans=0.125 2023-12-21 11:31:10,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=12.4 2023-12-21 11:31:14,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=13066.666666666666, ans=0.125 2023-12-21 11:31:18,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=13133.333333333334, ans=0.4403333333333333 2023-12-21 11:31:18,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=12.425 2023-12-21 11:31:19,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.425 2023-12-21 11:31:25,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=13133.333333333334, ans=0.4403333333333333 2023-12-21 11:31:29,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=13200.0, ans=11.6 2023-12-21 11:31:30,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=17.4 2023-12-21 11:31:35,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=13200.0, ans=0.125 2023-12-21 11:31:41,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13266.666666666666, ans=0.125 2023-12-21 11:31:49,735 INFO [train.py:886] (3/4) Epoch 1, batch 2000, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24750.00 frames. ], tot_loss[loss=0.02105, audio_tagging_loss=0.02105, over 4936338.58 frames. ], batch size: 99, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:31:51,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.69 vs. limit=11.666666666666668 2023-12-21 11:31:51,643 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+01 2.224e+01 2.549e+01 2.855e+01 5.920e+01, threshold=5.098e+01, percent-clipped=1.0 2023-12-21 11:31:57,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13333.333333333334, ans=0.125 2023-12-21 11:32:16,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=17.6 2023-12-21 11:32:17,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13466.666666666666, ans=0.125 2023-12-21 11:32:20,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=13533.333333333334, ans=0.07 2023-12-21 11:32:24,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=13533.333333333334, ans=0.125 2023-12-21 11:32:44,302 INFO [train.py:886] (3/4) Epoch 1, batch 2050, loss[loss=0.02083, audio_tagging_loss=0.02083, over 24750.00 frames. ], tot_loss[loss=0.02073, audio_tagging_loss=0.02073, over 4941769.87 frames. ], batch size: 99, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:32:46,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=13666.666666666666, ans=0.00972222222222223 2023-12-21 11:32:50,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13666.666666666666, ans=0.125 2023-12-21 11:32:53,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=12.65 2023-12-21 11:32:56,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=12.65 2023-12-21 11:33:00,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13733.333333333334, ans=0.125 2023-12-21 11:33:28,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=13933.333333333334, ans=0.008611111111111104 2023-12-21 11:33:36,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=27.79 vs. limit=12.75 2023-12-21 11:33:37,328 INFO [train.py:886] (3/4) Epoch 1, batch 2100, loss[loss=0.02156, audio_tagging_loss=0.02156, over 25000.00 frames. ], tot_loss[loss=0.02056, audio_tagging_loss=0.02056, over 4943370.76 frames. ], batch size: 100, lr: 4.42e-02, grad_scale: 32.0 2023-12-21 11:33:39,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.625e+01 2.173e+01 2.502e+01 2.826e+01 4.812e+01, threshold=5.003e+01, percent-clipped=0.0 2023-12-21 11:33:40,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14000.0, ans=0.125 2023-12-21 11:33:41,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=18.0 2023-12-21 11:33:52,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=18.05 2023-12-21 11:34:13,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=14200.0, ans=0.125 2023-12-21 11:34:26,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=56.61 vs. limit=12.85 2023-12-21 11:34:26,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=18.2 2023-12-21 11:34:30,104 INFO [train.py:886] (3/4) Epoch 1, batch 2150, loss[loss=0.02485, audio_tagging_loss=0.02485, over 25000.00 frames. ], tot_loss[loss=0.02049, audio_tagging_loss=0.02049, over 4951489.56 frames. ], batch size: 100, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:34:32,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=12.875 2023-12-21 11:34:34,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14333.333333333334, ans=0.125 2023-12-21 11:34:36,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=14333.333333333334, ans=0.025 2023-12-21 11:34:37,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14333.333333333334, ans=0.125 2023-12-21 11:34:44,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=18.3 2023-12-21 11:34:46,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=14400.0, ans=0.006666666666666668 2023-12-21 11:34:53,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=9.786666666666665 2023-12-21 11:35:03,575 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.362e+00 2023-12-21 11:35:12,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=14600.0, ans=0.389 2023-12-21 11:35:23,970 INFO [train.py:886] (3/4) Epoch 1, batch 2200, loss[loss=0.01733, audio_tagging_loss=0.01733, over 24750.00 frames. ], tot_loss[loss=0.02063, audio_tagging_loss=0.02063, over 4949096.97 frames. ], batch size: 99, lr: 4.41e-02, grad_scale: 32.0 2023-12-21 11:35:25,954 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.344e+01 2.656e+01 2.983e+01 4.042e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 11:35:33,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=9.866666666666667 2023-12-21 11:35:34,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=13.025 2023-12-21 11:35:36,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=14733.333333333334, ans=0.007666666666666667 2023-12-21 11:35:36,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=14733.333333333334, ans=0.125 2023-12-21 11:35:42,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=14733.333333333334, ans=0.3843333333333333 2023-12-21 11:35:42,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=14733.333333333334, ans=12.366666666666667 2023-12-21 11:35:54,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=18.6 2023-12-21 11:35:59,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=14866.666666666666, ans=0.02 2023-12-21 11:36:16,360 INFO [train.py:886] (3/4) Epoch 1, batch 2250, loss[loss=0.01822, audio_tagging_loss=0.01822, over 24750.00 frames. ], tot_loss[loss=0.02065, audio_tagging_loss=0.02065, over 4943575.70 frames. ], batch size: 99, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:36:20,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=15000.0, ans=0.02 2023-12-21 11:36:26,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=13.15 2023-12-21 11:36:27,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15066.666666666666, ans=0.14933333333333335 2023-12-21 11:36:50,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=10.08 2023-12-21 11:36:52,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=18.9 2023-12-21 11:36:57,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=15266.666666666666, ans=0.125 2023-12-21 11:37:05,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=13.225 2023-12-21 11:37:08,449 INFO [train.py:886] (3/4) Epoch 1, batch 2300, loss[loss=0.01989, audio_tagging_loss=0.01989, over 25000.00 frames. ], tot_loss[loss=0.02058, audio_tagging_loss=0.02058, over 4944598.61 frames. ], batch size: 100, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:37:08,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=15333.333333333334, ans=0.07 2023-12-21 11:37:10,372 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.341e+01 2.591e+01 2.957e+01 4.107e+01, threshold=5.182e+01, percent-clipped=0.0 2023-12-21 11:37:14,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=15333.333333333334, ans=0.125 2023-12-21 11:37:17,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.67 vs. limit=8.833333333333334 2023-12-21 11:37:27,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=13.275 2023-12-21 11:37:34,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=24.11 vs. limit=13.3 2023-12-21 11:37:43,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.65 vs. limit=12.766666666666667 2023-12-21 11:38:01,166 INFO [train.py:886] (3/4) Epoch 1, batch 2350, loss[loss=0.02119, audio_tagging_loss=0.02119, over 24750.00 frames. ], tot_loss[loss=0.02043, audio_tagging_loss=0.02043, over 4948966.51 frames. ], batch size: 99, lr: 4.40e-02, grad_scale: 64.0 2023-12-21 11:38:02,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.28 vs. limit=12.833333333333332 2023-12-21 11:38:38,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=15866.666666666666, ans=0.125 2023-12-21 11:38:53,115 INFO [train.py:886] (3/4) Epoch 1, batch 2400, loss[loss=0.01745, audio_tagging_loss=0.01745, over 25000.00 frames. ], tot_loss[loss=0.0203, audio_tagging_loss=0.0203, over 4946978.80 frames. ], batch size: 100, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:38:54,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.350e+01 2.627e+01 2.967e+01 3.953e+01, threshold=5.253e+01, percent-clipped=0.0 2023-12-21 11:38:55,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=16000.0, ans=0.0 2023-12-21 11:39:09,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.89 vs. limit=19.55 2023-12-21 11:39:17,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.79 vs. limit=13.066666666666666 2023-12-21 11:39:18,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16133.333333333334, ans=0.13866666666666666 2023-12-21 11:39:27,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16200.0, ans=0.125 2023-12-21 11:39:28,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16200.0, ans=0.138 2023-12-21 11:39:33,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=16200.0, ans=0.3330000000000001 2023-12-21 11:39:38,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16266.666666666666, ans=0.13733333333333334 2023-12-21 11:39:44,698 INFO [train.py:886] (3/4) Epoch 1, batch 2450, loss[loss=0.01899, audio_tagging_loss=0.01899, over 24066.00 frames. ], tot_loss[loss=0.02029, audio_tagging_loss=0.02029, over 4945684.31 frames. ], batch size: 100, lr: 4.39e-02, grad_scale: 64.0 2023-12-21 11:39:49,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16333.333333333334, ans=0.0 2023-12-21 11:40:07,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=13.675 2023-12-21 11:40:10,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=13.675 2023-12-21 11:40:16,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=16533.333333333332, ans=0.125 2023-12-21 11:40:19,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=16533.333333333332, ans=0.0072753623188405794 2023-12-21 11:40:20,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=16533.333333333332, ans=13.266666666666666 2023-12-21 11:40:24,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=13.725 2023-12-21 11:40:27,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=16600.0, ans=0.0 2023-12-21 11:40:35,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=16600.0, ans=0.125 2023-12-21 11:40:36,792 INFO [train.py:886] (3/4) Epoch 1, batch 2500, loss[loss=0.02025, audio_tagging_loss=0.02025, over 24750.00 frames. ], tot_loss[loss=0.0204, audio_tagging_loss=0.0204, over 4942952.16 frames. ], batch size: 99, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:40:38,718 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.472e+01 2.667e+01 3.044e+01 4.269e+01, threshold=5.334e+01, percent-clipped=0.0 2023-12-21 11:40:39,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=16666.666666666668, ans=0.007246376811594202 2023-12-21 11:40:43,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=9.166666666666668 2023-12-21 11:40:59,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=16800.0, ans=0.0 2023-12-21 11:41:03,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=16800.0, ans=0.125 2023-12-21 11:41:12,110 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.456e+01 2023-12-21 11:41:27,282 INFO [train.py:886] (3/4) Epoch 1, batch 2550, loss[loss=0.02022, audio_tagging_loss=0.02022, over 24750.00 frames. ], tot_loss[loss=0.0204, audio_tagging_loss=0.0204, over 4939537.43 frames. ], batch size: 99, lr: 4.38e-02, grad_scale: 64.0 2023-12-21 11:41:31,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=17000.0, ans=0.125 2023-12-21 11:41:33,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17000.0, ans=0.0 2023-12-21 11:41:36,334 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.222e+00 2023-12-21 11:41:39,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=17066.666666666668, ans=13.9 2023-12-21 11:41:49,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=17133.333333333332, ans=0.025 2023-12-21 11:41:49,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=13.924999999999999 2023-12-21 11:42:07,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.88 vs. limit=9.3 2023-12-21 11:42:16,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=17266.666666666668, ans=0.0 2023-12-21 11:42:21,173 INFO [train.py:886] (3/4) Epoch 1, batch 2600, loss[loss=0.02108, audio_tagging_loss=0.02108, over 24750.00 frames. ], tot_loss[loss=0.02018, audio_tagging_loss=0.02018, over 4940808.93 frames. ], batch size: 99, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:42:23,107 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.528e+01 2.807e+01 3.292e+01 4.352e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 11:42:29,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=21.88 vs. limit=14.0 2023-12-21 11:42:31,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17400.0, ans=0.0 2023-12-21 11:42:41,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=17466.666666666668, ans=0.09899494936611666 2023-12-21 11:42:45,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=14.05 2023-12-21 11:42:46,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=17466.666666666668, ans=0.462 2023-12-21 11:42:53,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=17533.333333333332, ans=0.125 2023-12-21 11:42:57,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=17533.333333333332, ans=0.0 2023-12-21 11:42:57,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17533.333333333332, ans=0.125 2023-12-21 11:42:59,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17533.333333333332, ans=0.12466666666666668 2023-12-21 11:43:09,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=17600.0, ans=0.28400000000000003 2023-12-21 11:43:13,592 INFO [train.py:886] (3/4) Epoch 1, batch 2650, loss[loss=0.0203, audio_tagging_loss=0.0203, over 25000.00 frames. ], tot_loss[loss=0.01997, audio_tagging_loss=0.01997, over 4943375.82 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 64.0 2023-12-21 11:43:20,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=17666.666666666668, ans=0.125 2023-12-21 11:43:21,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.06 vs. limit=20.75 2023-12-21 11:43:22,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=20.75 2023-12-21 11:43:27,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=17733.333333333332, ans=0.07 2023-12-21 11:43:29,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17733.333333333332, ans=0.125 2023-12-21 11:43:30,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=11.093333333333334 2023-12-21 11:43:31,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=17733.333333333332, ans=0.12266666666666667 2023-12-21 11:43:32,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=14.15 2023-12-21 11:43:45,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=14.2 2023-12-21 11:43:52,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=20.9 2023-12-21 11:43:56,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=5.6899999999999995 2023-12-21 11:43:57,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=20.95 2023-12-21 11:44:05,201 INFO [train.py:886] (3/4) Epoch 1, batch 2700, loss[loss=0.01776, audio_tagging_loss=0.01776, over 24750.00 frames. ], tot_loss[loss=0.01992, audio_tagging_loss=0.01992, over 4946925.76 frames. ], batch size: 99, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:44:06,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=18000.0, ans=0.125 2023-12-21 11:44:07,125 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.548e+01 2.795e+01 3.093e+01 4.851e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 11:44:10,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=18000.0, ans=0.07 2023-12-21 11:44:15,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=5.71 2023-12-21 11:44:16,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=14.275 2023-12-21 11:44:36,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=18200.0, ans=0.125 2023-12-21 11:44:52,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=18266.666666666668, ans=14.133333333333335 2023-12-21 11:44:58,039 INFO [train.py:886] (3/4) Epoch 1, batch 2750, loss[loss=0.01858, audio_tagging_loss=0.01858, over 25000.00 frames. ], tot_loss[loss=0.01992, audio_tagging_loss=0.01992, over 4949687.88 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:44:59,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.74 vs. limit=21.25 2023-12-21 11:45:08,847 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=1.269e+00 2023-12-21 11:45:10,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.60 vs. limit=21.3 2023-12-21 11:45:18,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=18466.666666666668, ans=0.125 2023-12-21 11:45:49,232 INFO [train.py:886] (3/4) Epoch 1, batch 2800, loss[loss=0.01985, audio_tagging_loss=0.01985, over 25000.00 frames. ], tot_loss[loss=0.01992, audio_tagging_loss=0.01992, over 4955035.44 frames. ], batch size: 100, lr: 4.36e-02, grad_scale: 64.0 2023-12-21 11:45:51,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.719e+01 3.067e+01 3.329e+01 4.208e+01, threshold=6.133e+01, percent-clipped=0.0 2023-12-21 11:45:54,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=18666.666666666668, ans=0.0 2023-12-21 11:46:04,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=18733.333333333332, ans=0.125 2023-12-21 11:46:04,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=18733.333333333332, ans=0.2443333333333334 2023-12-21 11:46:25,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=26.16 vs. limit=14.575 2023-12-21 11:46:41,864 INFO [train.py:886] (3/4) Epoch 1, batch 2850, loss[loss=0.0197, audio_tagging_loss=0.0197, over 24750.00 frames. ], tot_loss[loss=0.01993, audio_tagging_loss=0.01993, over 4951864.49 frames. ], batch size: 99, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:47:05,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=19133.333333333332, ans=0.2303333333333334 2023-12-21 11:47:06,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19133.333333333332, ans=0.10866666666666669 2023-12-21 11:47:10,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.03 vs. limit=21.85 2023-12-21 11:47:10,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=14.675 2023-12-21 11:47:13,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=5.88 2023-12-21 11:47:18,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19200.0, ans=0.10800000000000001 2023-12-21 11:47:19,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=19200.0, ans=21.9 2023-12-21 11:47:20,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=19200.0, ans=0.125 2023-12-21 11:47:27,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=21.95 2023-12-21 11:47:33,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=14.725 2023-12-21 11:47:35,410 INFO [train.py:886] (3/4) Epoch 1, batch 2900, loss[loss=0.01876, audio_tagging_loss=0.01876, over 24750.00 frames. ], tot_loss[loss=0.01977, audio_tagging_loss=0.01977, over 4951126.50 frames. ], batch size: 99, lr: 4.35e-02, grad_scale: 64.0 2023-12-21 11:47:37,293 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.573e+01 2.906e+01 3.283e+01 4.730e+01, threshold=5.812e+01, percent-clipped=0.0 2023-12-21 11:47:39,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=19333.333333333332, ans=0.025 2023-12-21 11:47:43,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19333.333333333332, ans=0.10666666666666669 2023-12-21 11:48:03,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=19466.666666666668, ans=0.00663768115942029 2023-12-21 11:48:04,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=19466.666666666668, ans=0.125 2023-12-21 11:48:08,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=19533.333333333332, ans=0.006623188405797102 2023-12-21 11:48:21,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=19600.0, ans=0.125 2023-12-21 11:48:26,187 INFO [train.py:886] (3/4) Epoch 1, batch 2950, loss[loss=0.02091, audio_tagging_loss=0.02091, over 25000.00 frames. ], tot_loss[loss=0.0197, audio_tagging_loss=0.0197, over 4953268.02 frames. ], batch size: 100, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:48:33,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=11.866666666666667 2023-12-21 11:48:37,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=19733.333333333332, ans=0.125 2023-12-21 11:48:43,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=19733.333333333332, ans=0.125 2023-12-21 11:48:49,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19800.0, ans=0.10200000000000001 2023-12-21 11:48:51,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=19800.0, ans=0.125 2023-12-21 11:48:54,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=19800.0, ans=0.125 2023-12-21 11:48:57,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.32 vs. limit=9.966666666666667 2023-12-21 11:49:13,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=19933.333333333332, ans=0.125 2023-12-21 11:49:18,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=19933.333333333332, ans=0.20233333333333337 2023-12-21 11:49:19,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=20000.0, ans=0.006521739130434783 2023-12-21 11:49:20,067 INFO [train.py:886] (3/4) Epoch 1, batch 3000, loss[loss=0.01842, audio_tagging_loss=0.01842, over 25000.00 frames. ], tot_loss[loss=0.01961, audio_tagging_loss=0.01961, over 4955636.59 frames. ], batch size: 100, lr: 4.34e-02, grad_scale: 64.0 2023-12-21 11:49:20,068 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 11:49:45,411 INFO [train.py:917] (3/4) Epoch 1, validation: loss=0.04441, audio_tagging_loss=0.04441, over 3737520.00 frames. 2023-12-21 11:49:45,412 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 11:49:46,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.78 vs. limit=22.5 2023-12-21 11:49:47,302 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.623e+01 2.967e+01 3.286e+01 5.413e+01, threshold=5.933e+01, percent-clipped=0.0 2023-12-21 11:49:54,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2023-12-21 11:50:06,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=20133.333333333332, ans=0.2 2023-12-21 11:50:06,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=15.0 2023-12-21 11:50:18,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=20200.0, ans=0.0 2023-12-21 11:50:23,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=20200.0, ans=0.035 2023-12-21 11:50:35,809 INFO [train.py:886] (3/4) Epoch 1, batch 3050, loss[loss=0.01936, audio_tagging_loss=0.01936, over 25000.00 frames. ], tot_loss[loss=0.0196, audio_tagging_loss=0.0196, over 4959700.70 frames. ], batch size: 100, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:50:38,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=15.0 2023-12-21 11:50:53,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=20400.0, ans=0.125 2023-12-21 11:50:56,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20466.666666666668, ans=0.1 2023-12-21 11:51:09,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=24.84 vs. limit=22.5 2023-12-21 11:51:18,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=20600.0, ans=0.125 2023-12-21 11:51:21,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=20600.0, ans=0.0 2023-12-21 11:51:23,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20600.0, ans=0.125 2023-12-21 11:51:27,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20600.0, ans=0.125 2023-12-21 11:51:29,132 INFO [train.py:886] (3/4) Epoch 1, batch 3100, loss[loss=0.02323, audio_tagging_loss=0.02323, over 24750.00 frames. ], tot_loss[loss=0.01972, audio_tagging_loss=0.01972, over 4955475.35 frames. ], batch size: 99, lr: 4.33e-02, grad_scale: 64.0 2023-12-21 11:51:31,052 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.608e+01 2.817e+01 3.164e+01 4.242e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 11:51:33,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=12.0 2023-12-21 11:51:49,688 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 11:51:59,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=15.0 2023-12-21 11:52:06,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20866.666666666668, ans=0.1 2023-12-21 11:52:08,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=20866.666666666668, ans=0.125 2023-12-21 11:52:16,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=20933.333333333332, ans=0.006318840579710145 2023-12-21 11:52:16,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=20933.333333333332, ans=0.125 2023-12-21 11:52:20,860 INFO [train.py:886] (3/4) Epoch 1, batch 3150, loss[loss=0.01816, audio_tagging_loss=0.01816, over 25000.00 frames. ], tot_loss[loss=0.01984, audio_tagging_loss=0.01984, over 4949794.23 frames. ], batch size: 100, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:52:23,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=21000.0, ans=0.04949747468305833 2023-12-21 11:52:24,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21000.0, ans=0.125 2023-12-21 11:52:31,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=12.0 2023-12-21 11:52:39,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=21066.666666666668, ans=0.0 2023-12-21 11:52:55,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.67 vs. limit=6.0 2023-12-21 11:52:59,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=21200.0, ans=0.1 2023-12-21 11:53:09,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21266.666666666668, ans=0.1 2023-12-21 11:53:09,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=21266.666666666668, ans=0.125 2023-12-21 11:53:11,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21266.666666666668, ans=0.1 2023-12-21 11:53:13,087 INFO [train.py:886] (3/4) Epoch 1, batch 3200, loss[loss=0.01843, audio_tagging_loss=0.01843, over 24750.00 frames. ], tot_loss[loss=0.01976, audio_tagging_loss=0.01976, over 4951174.13 frames. ], batch size: 99, lr: 4.32e-02, grad_scale: 64.0 2023-12-21 11:53:14,986 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.755e+01 2.973e+01 3.408e+01 4.303e+01, threshold=5.945e+01, percent-clipped=0.0 2023-12-21 11:53:20,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-21 11:53:25,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=21400.0, ans=0.0 2023-12-21 11:53:41,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=21466.666666666668, ans=0.0 2023-12-21 11:53:46,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=21533.333333333332, ans=0.125 2023-12-21 11:53:50,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21533.333333333332, ans=0.125 2023-12-21 11:54:02,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=15.0 2023-12-21 11:54:05,817 INFO [train.py:886] (3/4) Epoch 1, batch 3250, loss[loss=0.01892, audio_tagging_loss=0.01892, over 25000.00 frames. ], tot_loss[loss=0.01958, audio_tagging_loss=0.01958, over 4955972.49 frames. ], batch size: 100, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:10,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=21666.666666666668, ans=0.0 2023-12-21 11:54:10,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 11:54:13,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21666.666666666668, ans=0.125 2023-12-21 11:54:31,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=21800.0, ans=0.95 2023-12-21 11:54:49,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.65 vs. limit=15.0 2023-12-21 11:54:52,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21933.333333333332, ans=0.0 2023-12-21 11:54:56,781 INFO [train.py:886] (3/4) Epoch 1, batch 3300, loss[loss=0.01759, audio_tagging_loss=0.01759, over 25000.00 frames. ], tot_loss[loss=0.01936, audio_tagging_loss=0.01936, over 4959500.83 frames. ], batch size: 100, lr: 4.31e-02, grad_scale: 64.0 2023-12-21 11:54:59,349 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.622e+01 2.937e+01 3.224e+01 4.411e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-21 11:55:07,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=22066.666666666668, ans=0.006072463768115942 2023-12-21 11:55:13,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=22066.666666666668, ans=0.006072463768115942 2023-12-21 11:55:16,948 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.905e+00 2023-12-21 11:55:17,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22133.333333333332, ans=0.07 2023-12-21 11:55:21,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=15.0 2023-12-21 11:55:39,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=27.78 vs. limit=15.0 2023-12-21 11:55:42,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22266.666666666668, ans=0.1 2023-12-21 11:55:50,069 INFO [train.py:886] (3/4) Epoch 1, batch 3350, loss[loss=0.01762, audio_tagging_loss=0.01762, over 25000.00 frames. ], tot_loss[loss=0.01942, audio_tagging_loss=0.01942, over 4963048.04 frames. ], batch size: 100, lr: 4.30e-02, grad_scale: 64.0 2023-12-21 11:55:54,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22333.333333333332, ans=0.1 2023-12-21 11:56:02,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-12-21 11:56:04,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=22400.0, ans=22.5 2023-12-21 11:56:17,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-21 11:56:20,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=22533.333333333332, ans=0.005971014492753624 2023-12-21 11:56:24,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-12-21 11:56:31,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-12-21 11:56:41,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=15.0 2023-12-21 11:56:43,114 INFO [train.py:886] (3/4) Epoch 1, batch 3400, loss[loss=0.0214, audio_tagging_loss=0.0214, over 25000.00 frames. ], tot_loss[loss=0.01944, audio_tagging_loss=0.01944, over 4964903.54 frames. ], batch size: 100, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:56:45,032 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.691e+01 2.947e+01 3.309e+01 4.555e+01, threshold=5.894e+01, percent-clipped=0.0 2023-12-21 11:56:47,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=22666.666666666668, ans=0.0 2023-12-21 11:57:02,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=22800.0, ans=0.125 2023-12-21 11:57:03,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=22800.0, ans=0.00591304347826087 2023-12-21 11:57:04,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.80 vs. limit=22.5 2023-12-21 11:57:06,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=22800.0, ans=0.0 2023-12-21 11:57:17,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=22866.666666666668, ans=0.125 2023-12-21 11:57:19,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=22866.666666666668, ans=0.0 2023-12-21 11:57:20,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-21 11:57:22,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-21 11:57:33,825 INFO [train.py:886] (3/4) Epoch 1, batch 3450, loss[loss=0.01946, audio_tagging_loss=0.01946, over 24750.00 frames. ], tot_loss[loss=0.01955, audio_tagging_loss=0.01955, over 4960321.74 frames. ], batch size: 99, lr: 4.29e-02, grad_scale: 64.0 2023-12-21 11:57:46,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.04 vs. limit=22.5 2023-12-21 11:57:46,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-21 11:57:51,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23066.666666666668, ans=0.1 2023-12-21 11:57:52,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=15.0 2023-12-21 11:58:02,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=23133.333333333332, ans=0.2 2023-12-21 11:58:05,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.08 vs. limit=22.5 2023-12-21 11:58:14,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23266.666666666668, ans=0.125 2023-12-21 11:58:23,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=23333.333333333332, ans=0.0 2023-12-21 11:58:24,318 INFO [train.py:886] (3/4) Epoch 1, batch 3500, loss[loss=0.0213, audio_tagging_loss=0.0213, over 24750.00 frames. ], tot_loss[loss=0.01957, audio_tagging_loss=0.01957, over 4953023.00 frames. ], batch size: 99, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:58:26,215 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.654e+01 2.914e+01 3.165e+01 4.933e+01, threshold=5.829e+01, percent-clipped=0.0 2023-12-21 11:58:29,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23333.333333333332, ans=0.1 2023-12-21 11:58:33,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=23400.0, ans=0.0 2023-12-21 11:58:35,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=23400.0, ans=0.2 2023-12-21 11:58:37,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2023-12-21 11:58:57,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-21 11:59:10,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-12-21 11:59:15,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-21 11:59:15,527 INFO [train.py:886] (3/4) Epoch 1, batch 3550, loss[loss=0.01723, audio_tagging_loss=0.01723, over 25000.00 frames. ], tot_loss[loss=0.0194, audio_tagging_loss=0.0194, over 4951100.37 frames. ], batch size: 100, lr: 4.28e-02, grad_scale: 64.0 2023-12-21 11:59:35,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=23800.0, ans=0.0056956521739130435 2023-12-21 11:59:52,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23866.666666666668, ans=0.125 2023-12-21 11:59:52,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-21 11:59:55,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=23933.333333333332, ans=0.2 2023-12-21 12:00:05,895 INFO [train.py:886] (3/4) Epoch 1, batch 3600, loss[loss=0.01833, audio_tagging_loss=0.01833, over 24750.00 frames. ], tot_loss[loss=0.0192, audio_tagging_loss=0.0192, over 4954572.32 frames. ], batch size: 99, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:00:07,819 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.526e+01 2.849e+01 3.295e+01 5.645e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-21 12:00:15,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=24066.666666666668, ans=0.2 2023-12-21 12:00:15,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=24066.666666666668, ans=0.125 2023-12-21 12:00:29,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=24133.333333333332, ans=0.125 2023-12-21 12:00:36,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=24200.0, ans=0.2 2023-12-21 12:00:38,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=24200.0, ans=0.0 2023-12-21 12:00:40,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=24200.0, ans=0.0 2023-12-21 12:00:40,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.17 vs. limit=10.0 2023-12-21 12:00:43,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=15.0 2023-12-21 12:00:51,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=24266.666666666668, ans=0.005594202898550725 2023-12-21 12:00:57,626 INFO [train.py:886] (3/4) Epoch 1, batch 3650, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01912, audio_tagging_loss=0.01912, over 4959002.22 frames. ], batch size: 100, lr: 4.27e-02, grad_scale: 64.0 2023-12-21 12:01:03,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-12-21 12:01:05,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.37 vs. limit=15.0 2023-12-21 12:01:24,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24466.666666666668, ans=0.125 2023-12-21 12:01:47,743 INFO [train.py:886] (3/4) Epoch 1, batch 3700, loss[loss=0.02027, audio_tagging_loss=0.02027, over 24750.00 frames. ], tot_loss[loss=0.01915, audio_tagging_loss=0.01915, over 4963999.67 frames. ], batch size: 99, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:01:49,604 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.529e+01 2.856e+01 3.179e+01 4.127e+01, threshold=5.712e+01, percent-clipped=0.0 2023-12-21 12:02:04,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.76 vs. limit=10.0 2023-12-21 12:02:25,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24866.666666666668, ans=0.125 2023-12-21 12:02:26,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=24866.666666666668, ans=0.125 2023-12-21 12:02:26,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24866.666666666668, ans=0.1 2023-12-21 12:02:28,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-21 12:02:28,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.44 vs. limit=22.5 2023-12-21 12:02:36,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24933.333333333332, ans=0.1 2023-12-21 12:02:36,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=24933.333333333332, ans=0.125 2023-12-21 12:02:38,534 INFO [train.py:886] (3/4) Epoch 1, batch 3750, loss[loss=0.01742, audio_tagging_loss=0.01742, over 24750.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 4962059.23 frames. ], batch size: 99, lr: 4.26e-02, grad_scale: 64.0 2023-12-21 12:02:41,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=25000.0, ans=0.125 2023-12-21 12:03:00,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.16 vs. limit=10.0 2023-12-21 12:03:10,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25200.0, ans=0.125 2023-12-21 12:03:19,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25266.666666666668, ans=0.125 2023-12-21 12:03:30,816 INFO [train.py:886] (3/4) Epoch 1, batch 3800, loss[loss=0.01582, audio_tagging_loss=0.01582, over 24750.00 frames. ], tot_loss[loss=0.01928, audio_tagging_loss=0.01928, over 4954764.00 frames. ], batch size: 99, lr: 4.25e-02, grad_scale: 64.0 2023-12-21 12:03:32,663 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.585e+01 2.873e+01 3.239e+01 4.281e+01, threshold=5.745e+01, percent-clipped=0.0 2023-12-21 12:03:43,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-12-21 12:04:02,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=25533.333333333332, ans=0.125 2023-12-21 12:04:10,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=25600.0, ans=0.125 2023-12-21 12:04:15,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25600.0, ans=0.1 2023-12-21 12:04:20,856 INFO [train.py:886] (3/4) Epoch 1, batch 3850, loss[loss=0.02092, audio_tagging_loss=0.02092, over 25000.00 frames. ], tot_loss[loss=0.01932, audio_tagging_loss=0.01932, over 4949308.48 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:04:32,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=25733.333333333332, ans=0.125 2023-12-21 12:04:35,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=28.07 vs. limit=15.0 2023-12-21 12:04:39,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=25733.333333333332, ans=0.07 2023-12-21 12:04:44,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=25800.0, ans=0.125 2023-12-21 12:04:45,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=25800.0, ans=0.0 2023-12-21 12:04:48,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25800.0, ans=0.125 2023-12-21 12:05:08,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=25933.333333333332, ans=0.125 2023-12-21 12:05:11,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25933.333333333332, ans=0.125 2023-12-21 12:05:12,880 INFO [train.py:886] (3/4) Epoch 1, batch 3900, loss[loss=0.01823, audio_tagging_loss=0.01823, over 24750.00 frames. ], tot_loss[loss=0.01912, audio_tagging_loss=0.01912, over 4951036.48 frames. ], batch size: 99, lr: 4.24e-02, grad_scale: 64.0 2023-12-21 12:05:14,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.02 vs. limit=10.0 2023-12-21 12:05:14,770 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.615e+01 2.835e+01 3.211e+01 6.050e+01, threshold=5.671e+01, percent-clipped=1.0 2023-12-21 12:05:18,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.82 vs. limit=10.0 2023-12-21 12:05:27,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=26066.666666666668, ans=0.125 2023-12-21 12:05:45,693 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.364e+00 2023-12-21 12:05:54,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=26266.666666666668, ans=10.0 2023-12-21 12:05:56,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=26266.666666666668, ans=0.0 2023-12-21 12:06:02,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=26266.666666666668, ans=0.125 2023-12-21 12:06:05,267 INFO [train.py:886] (3/4) Epoch 1, batch 3950, loss[loss=0.01908, audio_tagging_loss=0.01908, over 25000.00 frames. ], tot_loss[loss=0.01908, audio_tagging_loss=0.01908, over 4957120.12 frames. ], batch size: 100, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:12,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-21 12:06:22,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2023-12-21 12:06:22,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-21 12:06:27,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26466.666666666668, ans=0.1 2023-12-21 12:06:40,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26533.333333333332, ans=0.1 2023-12-21 12:06:40,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-21 12:06:50,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=26600.0, ans=0.09899494936611666 2023-12-21 12:06:57,682 INFO [train.py:886] (3/4) Epoch 1, batch 4000, loss[loss=0.01406, audio_tagging_loss=0.01406, over 25000.00 frames. ], tot_loss[loss=0.01916, audio_tagging_loss=0.01916, over 4957887.41 frames. ], batch size: 100, lr: 4.23e-02, grad_scale: 64.0 2023-12-21 12:06:59,521 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.600e+01 2.855e+01 3.213e+01 4.653e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-21 12:06:59,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26666.666666666668, ans=0.125 2023-12-21 12:06:59,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=26666.666666666668, ans=0.0 2023-12-21 12:07:01,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26666.666666666668, ans=0.1 2023-12-21 12:07:01,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.65 vs. limit=15.0 2023-12-21 12:07:05,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.09 vs. limit=6.0 2023-12-21 12:07:09,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.22 vs. limit=6.0 2023-12-21 12:07:15,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=26733.333333333332, ans=0.125 2023-12-21 12:07:20,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=26800.0, ans=0.1 2023-12-21 12:07:32,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=26866.666666666668, ans=0.2 2023-12-21 12:07:38,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=26933.333333333332, ans=0.125 2023-12-21 12:07:50,888 INFO [train.py:886] (3/4) Epoch 1, batch 4050, loss[loss=0.01969, audio_tagging_loss=0.01969, over 24750.00 frames. ], tot_loss[loss=0.0192, audio_tagging_loss=0.0192, over 4955426.87 frames. ], batch size: 99, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:07:51,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-12-21 12:08:03,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27066.666666666668, ans=0.1 2023-12-21 12:08:03,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=15.0 2023-12-21 12:08:05,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=27066.666666666668, ans=0.125 2023-12-21 12:08:17,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=15.0 2023-12-21 12:08:23,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.61 vs. limit=10.0 2023-12-21 12:08:34,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=27266.666666666668, ans=0.125 2023-12-21 12:08:41,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.19 vs. limit=10.0 2023-12-21 12:08:41,510 INFO [train.py:886] (3/4) Epoch 1, batch 4100, loss[loss=0.01894, audio_tagging_loss=0.01894, over 24750.00 frames. ], tot_loss[loss=0.01936, audio_tagging_loss=0.01936, over 4948506.54 frames. ], batch size: 99, lr: 4.22e-02, grad_scale: 64.0 2023-12-21 12:08:44,145 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.558e+01 2.802e+01 3.131e+01 4.356e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 12:08:58,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-12-21 12:09:05,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=27466.666666666668, ans=0.125 2023-12-21 12:09:15,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-21 12:09:21,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.0 2023-12-21 12:09:25,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=27600.0, ans=0.004869565217391305 2023-12-21 12:09:34,832 INFO [train.py:886] (3/4) Epoch 1, batch 4150, loss[loss=0.02176, audio_tagging_loss=0.02176, over 24750.00 frames. ], tot_loss[loss=0.01931, audio_tagging_loss=0.01931, over 4946530.11 frames. ], batch size: 99, lr: 4.21e-02, grad_scale: 64.0 2023-12-21 12:09:36,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=27666.666666666668, ans=0.125 2023-12-21 12:09:38,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=27666.666666666668, ans=0.125 2023-12-21 12:09:40,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=27666.666666666668, ans=0.125 2023-12-21 12:09:43,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=27733.333333333332, ans=0.0 2023-12-21 12:09:54,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=15.0 2023-12-21 12:09:58,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=27800.0, ans=0.0 2023-12-21 12:10:01,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=27800.0, ans=0.125 2023-12-21 12:10:03,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=27800.0, ans=0.00482608695652174 2023-12-21 12:10:09,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=27866.666666666668, ans=0.0 2023-12-21 12:10:09,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=27866.666666666668, ans=0.0 2023-12-21 12:10:09,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.85 vs. limit=22.5 2023-12-21 12:10:15,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-21 12:10:17,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27933.333333333332, ans=0.1 2023-12-21 12:10:22,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=27933.333333333332, ans=0.125 2023-12-21 12:10:24,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-21 12:10:27,464 INFO [train.py:886] (3/4) Epoch 1, batch 4200, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4947203.60 frames. ], batch size: 100, lr: 4.20e-02, grad_scale: 64.0 2023-12-21 12:10:30,044 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.568e+01 2.812e+01 3.182e+01 3.944e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:10:33,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=28000.0, ans=0.125 2023-12-21 12:11:09,221 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.816e-01 2023-12-21 12:11:10,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=28266.666666666668, ans=0.2 2023-12-21 12:11:17,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=15.0 2023-12-21 12:11:18,580 INFO [train.py:886] (3/4) Epoch 1, batch 4250, loss[loss=0.01862, audio_tagging_loss=0.01862, over 24750.00 frames. ], tot_loss[loss=0.01888, audio_tagging_loss=0.01888, over 4952689.59 frames. ], batch size: 99, lr: 4.20e-02, grad_scale: 128.0 2023-12-21 12:11:57,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=28533.333333333332, ans=0.125 2023-12-21 12:12:03,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=28600.0, ans=22.5 2023-12-21 12:12:10,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2023-12-21 12:12:11,740 INFO [train.py:886] (3/4) Epoch 1, batch 4300, loss[loss=0.019, audio_tagging_loss=0.019, over 24048.00 frames. ], tot_loss[loss=0.01886, audio_tagging_loss=0.01886, over 4955445.06 frames. ], batch size: 100, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:12:13,642 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.582e+01 2.869e+01 3.269e+01 4.965e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-21 12:12:13,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28666.666666666668, ans=0.125 2023-12-21 12:12:26,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=28733.333333333332, ans=0.125 2023-12-21 12:12:29,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=28733.333333333332, ans=0.035 2023-12-21 12:12:34,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=28800.0, ans=0.004608695652173913 2023-12-21 12:13:03,095 INFO [train.py:886] (3/4) Epoch 1, batch 4350, loss[loss=0.0177, audio_tagging_loss=0.0177, over 24750.00 frames. ], tot_loss[loss=0.01886, audio_tagging_loss=0.01886, over 4957805.61 frames. ], batch size: 99, lr: 4.19e-02, grad_scale: 128.0 2023-12-21 12:13:07,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29000.0, ans=0.1 2023-12-21 12:13:08,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=29000.0, ans=0.125 2023-12-21 12:13:09,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=29000.0, ans=0.004565217391304348 2023-12-21 12:13:11,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=29000.0, ans=0.125 2023-12-21 12:13:22,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.88 vs. limit=22.5 2023-12-21 12:13:37,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2023-12-21 12:13:56,074 INFO [train.py:886] (3/4) Epoch 1, batch 4400, loss[loss=0.01773, audio_tagging_loss=0.01773, over 24750.00 frames. ], tot_loss[loss=0.01907, audio_tagging_loss=0.01907, over 4952381.41 frames. ], batch size: 99, lr: 4.18e-02, grad_scale: 128.0 2023-12-21 12:13:57,947 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.596e+01 2.831e+01 3.124e+01 4.949e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-21 12:14:02,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=29333.333333333332, ans=0.125 2023-12-21 12:14:02,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=29333.333333333332, ans=0.0 2023-12-21 12:14:21,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=26.00 vs. limit=15.0 2023-12-21 12:14:29,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=15.0 2023-12-21 12:14:32,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.94 vs. limit=22.5 2023-12-21 12:14:33,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=29533.333333333332, ans=0.125 2023-12-21 12:14:48,784 INFO [train.py:886] (3/4) Epoch 1, batch 4450, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.01911, audio_tagging_loss=0.01911, over 4947746.60 frames. ], batch size: 100, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:15:20,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=29866.666666666668, ans=0.125 2023-12-21 12:15:25,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=29866.666666666668, ans=0.125 2023-12-21 12:15:34,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=29933.333333333332, ans=0.2 2023-12-21 12:15:40,549 INFO [train.py:886] (3/4) Epoch 1, batch 4500, loss[loss=0.01922, audio_tagging_loss=0.01922, over 25000.00 frames. ], tot_loss[loss=0.0189, audio_tagging_loss=0.0189, over 4948350.42 frames. ], batch size: 100, lr: 4.17e-02, grad_scale: 128.0 2023-12-21 12:15:43,805 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 2.897e+01 3.074e+01 4.883e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-21 12:16:05,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.33 vs. limit=10.0 2023-12-21 12:16:08,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2023-12-21 12:16:11,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=15.0 2023-12-21 12:16:21,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.11 vs. limit=10.0 2023-12-21 12:16:22,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.49 vs. limit=10.0 2023-12-21 12:16:32,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.26 vs. limit=22.5 2023-12-21 12:16:34,305 INFO [train.py:886] (3/4) Epoch 1, batch 4550, loss[loss=0.02271, audio_tagging_loss=0.02271, over 20579.00 frames. ], tot_loss[loss=0.01881, audio_tagging_loss=0.01881, over 4944731.88 frames. ], batch size: 107, lr: 4.16e-02, grad_scale: 128.0 2023-12-21 12:16:35,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=30333.333333333332, ans=0.0 2023-12-21 12:16:35,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=30333.333333333332, ans=0.07 2023-12-21 12:16:39,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30333.333333333332, ans=0.125 2023-12-21 12:16:43,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=30400.0, ans=0.5 2023-12-21 12:16:55,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.43 vs. limit=10.0 2023-12-21 12:16:56,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2023-12-21 12:17:08,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=30533.333333333332, ans=0.125 2023-12-21 12:17:11,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=30533.333333333332, ans=0.5 2023-12-21 12:17:12,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=30533.333333333332, ans=0.5 2023-12-21 12:17:14,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30600.0, ans=0.125 2023-12-21 12:17:14,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.10 vs. limit=6.0 2023-12-21 12:17:23,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=15.0 2023-12-21 12:17:27,221 INFO [train.py:886] (3/4) Epoch 1, batch 4600, loss[loss=0.01969, audio_tagging_loss=0.01969, over 25000.00 frames. ], tot_loss[loss=0.01878, audio_tagging_loss=0.01878, over 4947771.21 frames. ], batch size: 100, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:17:27,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-12-21 12:17:29,153 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.518e+01 2.768e+01 3.063e+01 4.476e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 12:17:32,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=30666.666666666668, ans=0.04949747468305833 2023-12-21 12:17:39,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=30733.333333333332, ans=0.00418840579710145 2023-12-21 12:17:40,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30733.333333333332, ans=0.125 2023-12-21 12:18:03,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30866.666666666668, ans=0.125 2023-12-21 12:18:17,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31000.0, ans=0.0 2023-12-21 12:18:18,729 INFO [train.py:886] (3/4) Epoch 1, batch 4650, loss[loss=0.02085, audio_tagging_loss=0.02085, over 25000.00 frames. ], tot_loss[loss=0.01882, audio_tagging_loss=0.01882, over 4956722.92 frames. ], batch size: 100, lr: 4.15e-02, grad_scale: 128.0 2023-12-21 12:18:23,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=31000.0, ans=0.0 2023-12-21 12:18:43,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.0 2023-12-21 12:18:55,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=31200.0, ans=0.00408695652173913 2023-12-21 12:19:08,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=31266.666666666668, ans=0.1 2023-12-21 12:19:09,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=31266.666666666668, ans=0.0 2023-12-21 12:19:10,780 INFO [train.py:886] (3/4) Epoch 1, batch 4700, loss[loss=0.01839, audio_tagging_loss=0.01839, over 24750.00 frames. ], tot_loss[loss=0.01892, audio_tagging_loss=0.01892, over 4956804.87 frames. ], batch size: 99, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:19:11,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2023-12-21 12:19:12,553 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.530e+01 2.695e+01 2.950e+01 3.950e+01, threshold=5.391e+01, percent-clipped=0.0 2023-12-21 12:19:17,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=31333.333333333332, ans=0.125 2023-12-21 12:19:39,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=31533.333333333332, ans=0.125 2023-12-21 12:19:48,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-12-21 12:19:57,268 INFO [train.py:886] (3/4) Epoch 1, batch 4750, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24750.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4951377.06 frames. ], batch size: 99, lr: 4.14e-02, grad_scale: 128.0 2023-12-21 12:19:57,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31666.666666666668, ans=0.0 2023-12-21 12:20:36,421 INFO [train.py:886] (3/4) Epoch 2, batch 0, loss[loss=0.04384, audio_tagging_loss=0.04384, over 24051.00 frames. ], tot_loss[loss=0.04384, audio_tagging_loss=0.04384, over 24051.00 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:20:36,422 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 12:20:50,698 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.4154, 1.3830, 1.2209, 1.3844, 1.2279, 1.3834, 1.2898, 1.1980], device='cuda:3') 2023-12-21 12:20:59,080 INFO [train.py:917] (3/4) Epoch 2, validation: loss=0.0423, audio_tagging_loss=0.0423, over 3737520.00 frames. 2023-12-21 12:20:59,080 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 12:21:11,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.52 vs. limit=22.5 2023-12-21 12:21:12,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=31840.0, ans=15.0 2023-12-21 12:21:19,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=31906.666666666668, ans=0.07 2023-12-21 12:21:19,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.00 vs. limit=15.0 2023-12-21 12:21:23,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=31906.666666666668, ans=0.125 2023-12-21 12:21:27,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.10 vs. limit=15.0 2023-12-21 12:21:29,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31973.333333333332, ans=0.1 2023-12-21 12:21:35,982 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.667e+01 2.944e+01 3.472e+01 1.120e+02, threshold=5.887e+01, percent-clipped=2.0 2023-12-21 12:21:46,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.49 vs. limit=22.5 2023-12-21 12:21:49,435 INFO [train.py:886] (3/4) Epoch 2, batch 50, loss[loss=0.02453, audio_tagging_loss=0.02453, over 25000.00 frames. ], tot_loss[loss=0.0298, audio_tagging_loss=0.0298, over 1121295.01 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 128.0 2023-12-21 12:21:53,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=32106.666666666668, ans=0.125 2023-12-21 12:21:54,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-21 12:22:02,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-12-21 12:22:03,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=35.40 vs. limit=15.0 2023-12-21 12:22:08,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=32173.333333333332, ans=0.1 2023-12-21 12:22:10,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32240.0, ans=0.0 2023-12-21 12:22:17,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-21 12:22:17,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=32240.0, ans=0.0 2023-12-21 12:22:20,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=32306.666666666668, ans=0.035 2023-12-21 12:22:41,789 INFO [train.py:886] (3/4) Epoch 2, batch 100, loss[loss=0.01911, audio_tagging_loss=0.01911, over 24878.00 frames. ], tot_loss[loss=0.02564, audio_tagging_loss=0.02564, over 1979130.55 frames. ], batch size: 100, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:22:48,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=32440.0, ans=0.0 2023-12-21 12:22:50,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=32506.666666666668, ans=0.125 2023-12-21 12:22:50,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=15.0 2023-12-21 12:22:52,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=32506.666666666668, ans=0.125 2023-12-21 12:23:17,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=32640.0, ans=0.09899494936611666 2023-12-21 12:23:18,385 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.835e+01 3.088e+01 3.489e+01 4.316e+01, threshold=6.177e+01, percent-clipped=0.0 2023-12-21 12:23:31,840 INFO [train.py:886] (3/4) Epoch 2, batch 150, loss[loss=0.01937, audio_tagging_loss=0.01937, over 25000.00 frames. ], tot_loss[loss=0.02335, audio_tagging_loss=0.02335, over 2641163.68 frames. ], batch size: 100, lr: 4.04e-02, grad_scale: 128.0 2023-12-21 12:23:43,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=32840.0, ans=0.2 2023-12-21 12:23:49,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-12-21 12:23:54,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.864e+00 2023-12-21 12:23:59,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=15.0 2023-12-21 12:24:02,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=32973.333333333336, ans=0.125 2023-12-21 12:24:08,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=15.0 2023-12-21 12:24:09,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-21 12:24:23,406 INFO [train.py:886] (3/4) Epoch 2, batch 200, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.02183, audio_tagging_loss=0.02183, over 3158836.63 frames. ], batch size: 100, lr: 4.03e-02, grad_scale: 128.0 2023-12-21 12:24:24,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=33106.666666666664, ans=0.125 2023-12-21 12:24:24,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-12-21 12:24:33,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-21 12:24:37,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-12-21 12:24:47,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-12-21 12:24:59,198 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.541e+01 2.777e+01 3.081e+01 4.614e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 12:25:08,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-12-21 12:25:12,690 INFO [train.py:886] (3/4) Epoch 2, batch 250, loss[loss=0.01931, audio_tagging_loss=0.01931, over 25000.00 frames. ], tot_loss[loss=0.02087, audio_tagging_loss=0.02087, over 3561429.65 frames. ], batch size: 100, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:25:21,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=33440.0, ans=0.125 2023-12-21 12:25:26,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.18 vs. limit=22.5 2023-12-21 12:25:36,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=33573.333333333336, ans=0.125 2023-12-21 12:25:46,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=33640.0, ans=0.125 2023-12-21 12:26:05,318 INFO [train.py:886] (3/4) Epoch 2, batch 300, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.02034, audio_tagging_loss=0.02034, over 3871352.14 frames. ], batch size: 99, lr: 4.02e-02, grad_scale: 128.0 2023-12-21 12:26:09,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33773.333333333336, ans=0.125 2023-12-21 12:26:14,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=33840.0, ans=0.0 2023-12-21 12:26:14,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-21 12:26:16,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=33840.0, ans=0.125 2023-12-21 12:26:16,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2023-12-21 12:26:19,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=33840.0, ans=0.0 2023-12-21 12:26:41,963 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.636e+01 2.849e+01 3.270e+01 4.493e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-21 12:26:49,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-12-21 12:26:58,347 INFO [train.py:886] (3/4) Epoch 2, batch 350, loss[loss=0.01822, audio_tagging_loss=0.01822, over 24750.00 frames. ], tot_loss[loss=0.02002, audio_tagging_loss=0.02002, over 4108563.33 frames. ], batch size: 99, lr: 4.01e-02, grad_scale: 128.0 2023-12-21 12:27:06,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=34106.666666666664, ans=0.1 2023-12-21 12:27:18,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 12:27:23,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=34240.0, ans=0.2 2023-12-21 12:27:26,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=26.48 vs. limit=15.0 2023-12-21 12:27:40,348 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.955e+01 2023-12-21 12:27:42,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=15.0 2023-12-21 12:27:48,700 INFO [train.py:886] (3/4) Epoch 2, batch 400, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01951, audio_tagging_loss=0.01951, over 4297698.04 frames. ], batch size: 100, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:27:51,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=15.0 2023-12-21 12:28:18,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.30 vs. limit=15.0 2023-12-21 12:28:25,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=39.03 vs. limit=15.0 2023-12-21 12:28:27,070 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.614e+01 2.832e+01 3.282e+01 4.627e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 12:28:31,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-12-21 12:28:41,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.22 vs. limit=22.5 2023-12-21 12:28:42,749 INFO [train.py:886] (3/4) Epoch 2, batch 450, loss[loss=0.01768, audio_tagging_loss=0.01768, over 25000.00 frames. ], tot_loss[loss=0.01917, audio_tagging_loss=0.01917, over 4441810.38 frames. ], batch size: 100, lr: 4.00e-02, grad_scale: 128.0 2023-12-21 12:28:57,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=34840.0, ans=0.125 2023-12-21 12:29:15,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=34973.333333333336, ans=0.035 2023-12-21 12:29:15,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=34973.333333333336, ans=0.125 2023-12-21 12:29:35,163 INFO [train.py:886] (3/4) Epoch 2, batch 500, loss[loss=0.0182, audio_tagging_loss=0.0182, over 25000.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4553419.02 frames. ], batch size: 100, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:29:36,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35106.666666666664, ans=0.1 2023-12-21 12:29:45,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=35173.333333333336, ans=0.0032231884057971017 2023-12-21 12:29:54,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=35173.333333333336, ans=15.0 2023-12-21 12:30:02,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=35240.0, ans=0.125 2023-12-21 12:30:13,254 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.483e+01 2.716e+01 2.937e+01 3.953e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 12:30:15,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35306.666666666664, ans=0.1 2023-12-21 12:30:16,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-12-21 12:30:25,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-12-21 12:30:26,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=35440.0, ans=0.125 2023-12-21 12:30:27,308 INFO [train.py:886] (3/4) Epoch 2, batch 550, loss[loss=0.01731, audio_tagging_loss=0.01731, over 23938.00 frames. ], tot_loss[loss=0.01903, audio_tagging_loss=0.01903, over 4645700.88 frames. ], batch size: 100, lr: 3.99e-02, grad_scale: 128.0 2023-12-21 12:30:30,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=35440.0, ans=0.125 2023-12-21 12:30:32,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=35440.0, ans=0.0 2023-12-21 12:31:01,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=35640.0, ans=0.0 2023-12-21 12:31:17,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=35706.666666666664, ans=0.125 2023-12-21 12:31:18,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=35706.666666666664, ans=0.0 2023-12-21 12:31:20,661 INFO [train.py:886] (3/4) Epoch 2, batch 600, loss[loss=0.01957, audio_tagging_loss=0.01957, over 24750.00 frames. ], tot_loss[loss=0.01913, audio_tagging_loss=0.01913, over 4712723.07 frames. ], batch size: 99, lr: 3.98e-02, grad_scale: 128.0 2023-12-21 12:31:23,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-12-21 12:31:30,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=35840.0, ans=0.125 2023-12-21 12:31:38,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.89 vs. limit=15.0 2023-12-21 12:31:58,312 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.564e+01 2.794e+01 3.187e+01 4.110e+01, threshold=5.587e+01, percent-clipped=0.0 2023-12-21 12:32:06,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=36040.0, ans=0.125 2023-12-21 12:32:12,573 INFO [train.py:886] (3/4) Epoch 2, batch 650, loss[loss=0.01745, audio_tagging_loss=0.01745, over 24750.00 frames. ], tot_loss[loss=0.01915, audio_tagging_loss=0.01915, over 4762020.19 frames. ], batch size: 99, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:32:19,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=36106.666666666664, ans=0.0 2023-12-21 12:32:35,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=36240.0, ans=0.125 2023-12-21 12:32:35,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=36240.0, ans=0.1 2023-12-21 12:32:37,558 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.680e+00 2023-12-21 12:32:43,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.62 vs. limit=22.5 2023-12-21 12:32:52,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2023-12-21 12:32:55,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.42 vs. limit=22.5 2023-12-21 12:32:58,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=36373.333333333336, ans=0.125 2023-12-21 12:33:06,072 INFO [train.py:886] (3/4) Epoch 2, batch 700, loss[loss=0.02028, audio_tagging_loss=0.02028, over 25000.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4801249.16 frames. ], batch size: 100, lr: 3.97e-02, grad_scale: 128.0 2023-12-21 12:33:10,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.30 vs. limit=22.5 2023-12-21 12:33:13,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=36440.0, ans=0.0 2023-12-21 12:33:13,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=36440.0, ans=0.2 2023-12-21 12:33:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=36506.666666666664, ans=0.002933333333333334 2023-12-21 12:33:22,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=36506.666666666664, ans=0.002933333333333334 2023-12-21 12:33:23,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=36506.666666666664, ans=0.002933333333333334 2023-12-21 12:33:24,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=36506.666666666664, ans=0.0 2023-12-21 12:33:26,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=36573.333333333336, ans=0.2 2023-12-21 12:33:30,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=27.32 vs. limit=22.5 2023-12-21 12:33:30,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-12-21 12:33:41,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2023-12-21 12:33:43,497 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.539e+01 2.879e+01 3.158e+01 4.912e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-21 12:33:59,179 INFO [train.py:886] (3/4) Epoch 2, batch 750, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01883, audio_tagging_loss=0.01883, over 4826798.71 frames. ], batch size: 99, lr: 3.96e-02, grad_scale: 128.0 2023-12-21 12:34:04,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=15.0 2023-12-21 12:34:13,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=36840.0, ans=0.125 2023-12-21 12:34:13,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-21 12:34:22,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2023-12-21 12:34:28,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2023-12-21 12:34:32,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=15.0 2023-12-21 12:34:36,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.18 vs. limit=15.0 2023-12-21 12:34:40,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2023-12-21 12:34:51,115 INFO [train.py:886] (3/4) Epoch 2, batch 800, loss[loss=0.01813, audio_tagging_loss=0.01813, over 25000.00 frames. ], tot_loss[loss=0.0188, audio_tagging_loss=0.0188, over 4855800.40 frames. ], batch size: 100, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:34:52,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2023-12-21 12:35:04,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=37173.333333333336, ans=0.125 2023-12-21 12:35:16,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37240.0, ans=0.1 2023-12-21 12:35:22,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=37306.666666666664, ans=0.2 2023-12-21 12:35:29,691 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.588e+01 2.884e+01 3.147e+01 4.791e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-21 12:35:30,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37306.666666666664, ans=0.0 2023-12-21 12:35:42,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37373.333333333336, ans=0.1 2023-12-21 12:35:43,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-12-21 12:35:44,793 INFO [train.py:886] (3/4) Epoch 2, batch 850, loss[loss=0.01698, audio_tagging_loss=0.01698, over 24750.00 frames. ], tot_loss[loss=0.0187, audio_tagging_loss=0.0187, over 4874496.63 frames. ], batch size: 99, lr: 3.95e-02, grad_scale: 128.0 2023-12-21 12:35:53,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.14 vs. limit=22.5 2023-12-21 12:36:00,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=37506.666666666664, ans=0.125 2023-12-21 12:36:06,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=12.0 2023-12-21 12:36:15,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=37640.0, ans=0.125 2023-12-21 12:36:37,460 INFO [train.py:886] (3/4) Epoch 2, batch 900, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24750.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 4885844.11 frames. ], batch size: 99, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:36:45,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37773.333333333336, ans=0.0 2023-12-21 12:37:00,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=37906.666666666664, ans=0.1 2023-12-21 12:37:14,917 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.584e+01 2.867e+01 3.127e+01 3.908e+01, threshold=5.734e+01, percent-clipped=0.0 2023-12-21 12:37:27,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=38106.666666666664, ans=0.2 2023-12-21 12:37:28,452 INFO [train.py:886] (3/4) Epoch 2, batch 950, loss[loss=0.01911, audio_tagging_loss=0.01911, over 25000.00 frames. ], tot_loss[loss=0.01871, audio_tagging_loss=0.01871, over 4895003.58 frames. ], batch size: 100, lr: 3.94e-02, grad_scale: 128.0 2023-12-21 12:37:34,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38106.666666666664, ans=0.125 2023-12-21 12:37:35,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=38106.666666666664, ans=0.0 2023-12-21 12:37:46,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=15.0 2023-12-21 12:37:49,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.51 vs. limit=5.0 2023-12-21 12:38:04,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=38306.666666666664, ans=0.125 2023-12-21 12:38:09,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.21 vs. limit=15.0 2023-12-21 12:38:10,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-21 12:38:22,429 INFO [train.py:886] (3/4) Epoch 2, batch 1000, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01857, audio_tagging_loss=0.01857, over 4903580.88 frames. ], batch size: 99, lr: 3.93e-02, grad_scale: 128.0 2023-12-21 12:38:29,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.00 vs. limit=22.5 2023-12-21 12:38:47,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38573.333333333336, ans=0.1 2023-12-21 12:38:54,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.39 vs. limit=22.5 2023-12-21 12:39:00,022 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.513e+01 2.801e+01 3.177e+01 4.242e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-21 12:39:09,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38706.666666666664, ans=0.1 2023-12-21 12:39:11,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38706.666666666664, ans=0.1 2023-12-21 12:39:14,308 INFO [train.py:886] (3/4) Epoch 2, batch 1050, loss[loss=0.01738, audio_tagging_loss=0.01738, over 25000.00 frames. ], tot_loss[loss=0.01851, audio_tagging_loss=0.01851, over 4912884.12 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:39:14,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=38773.333333333336, ans=0.0 2023-12-21 12:39:29,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=12.0 2023-12-21 12:39:43,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=12.0 2023-12-21 12:40:03,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=39040.0, ans=0.125 2023-12-21 12:40:05,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39040.0, ans=0.125 2023-12-21 12:40:06,874 INFO [train.py:886] (3/4) Epoch 2, batch 1100, loss[loss=0.01963, audio_tagging_loss=0.01963, over 25000.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 4922562.95 frames. ], batch size: 100, lr: 3.92e-02, grad_scale: 128.0 2023-12-21 12:40:13,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-21 12:40:17,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=39173.333333333336, ans=0.5 2023-12-21 12:40:32,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-21 12:40:33,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.10 vs. limit=10.0 2023-12-21 12:40:37,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=39306.666666666664, ans=0.2 2023-12-21 12:40:43,544 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.542e+01 2.826e+01 3.168e+01 4.060e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 12:40:48,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=39373.333333333336, ans=0.0 2023-12-21 12:40:59,809 INFO [train.py:886] (3/4) Epoch 2, batch 1150, loss[loss=0.01937, audio_tagging_loss=0.01937, over 25000.00 frames. ], tot_loss[loss=0.01857, audio_tagging_loss=0.01857, over 4930133.40 frames. ], batch size: 100, lr: 3.91e-02, grad_scale: 128.0 2023-12-21 12:41:02,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2023-12-21 12:41:16,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=39506.666666666664, ans=0.125 2023-12-21 12:41:22,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=39573.333333333336, ans=0.0 2023-12-21 12:41:25,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2023-12-21 12:41:36,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-21 12:41:37,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39640.0, ans=0.1 2023-12-21 12:41:43,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-21 12:41:50,037 INFO [train.py:886] (3/4) Epoch 2, batch 1200, loss[loss=0.01791, audio_tagging_loss=0.01791, over 25000.00 frames. ], tot_loss[loss=0.0186, audio_tagging_loss=0.0186, over 4942930.41 frames. ], batch size: 100, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:41:50,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39773.333333333336, ans=0.125 2023-12-21 12:42:07,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=39840.0, ans=0.125 2023-12-21 12:42:26,681 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.581e+01 2.851e+01 3.035e+01 4.083e+01, threshold=5.702e+01, percent-clipped=0.0 2023-12-21 12:42:42,771 INFO [train.py:886] (3/4) Epoch 2, batch 1250, loss[loss=0.02177, audio_tagging_loss=0.02177, over 24750.00 frames. ], tot_loss[loss=0.01868, audio_tagging_loss=0.01868, over 4935388.03 frames. ], batch size: 99, lr: 3.90e-02, grad_scale: 128.0 2023-12-21 12:43:01,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2023-12-21 12:43:16,750 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.537e-01 2023-12-21 12:43:23,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=40373.333333333336, ans=10.0 2023-12-21 12:43:25,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=40373.333333333336, ans=0.125 2023-12-21 12:43:31,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=40373.333333333336, ans=0.0020927536231884064 2023-12-21 12:43:35,072 INFO [train.py:886] (3/4) Epoch 2, batch 1300, loss[loss=0.02006, audio_tagging_loss=0.02006, over 25000.00 frames. ], tot_loss[loss=0.01889, audio_tagging_loss=0.01889, over 4933740.96 frames. ], batch size: 100, lr: 3.89e-02, grad_scale: 128.0 2023-12-21 12:43:45,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-12-21 12:43:45,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=40506.666666666664, ans=0.002063768115942029 2023-12-21 12:44:01,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40573.333333333336, ans=0.1 2023-12-21 12:44:08,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=40640.0, ans=0.05 2023-12-21 12:44:08,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=40640.0, ans=0.2 2023-12-21 12:44:10,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=40640.0, ans=0.0 2023-12-21 12:44:11,692 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.485e+01 2.836e+01 3.251e+01 4.235e+01, threshold=5.672e+01, percent-clipped=0.0 2023-12-21 12:44:18,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-21 12:44:25,324 INFO [train.py:886] (3/4) Epoch 2, batch 1350, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.01884, audio_tagging_loss=0.01884, over 4932849.66 frames. ], batch size: 99, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:44:29,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40773.333333333336, ans=0.125 2023-12-21 12:44:32,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=40773.333333333336, ans=0.0 2023-12-21 12:44:45,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-12-21 12:44:52,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.79 vs. limit=22.5 2023-12-21 12:45:02,276 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 12:45:13,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=41040.0, ans=0.125 2023-12-21 12:45:17,959 INFO [train.py:886] (3/4) Epoch 2, batch 1400, loss[loss=0.01918, audio_tagging_loss=0.01918, over 25000.00 frames. ], tot_loss[loss=0.01882, audio_tagging_loss=0.01882, over 4933755.69 frames. ], batch size: 100, lr: 3.88e-02, grad_scale: 128.0 2023-12-21 12:45:22,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=41106.666666666664, ans=0.2 2023-12-21 12:45:27,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=41173.333333333336, ans=0.0019188405797101443 2023-12-21 12:45:48,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=41306.666666666664, ans=0.125 2023-12-21 12:45:51,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.51 vs. limit=15.0 2023-12-21 12:45:54,840 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.439e+01 2.675e+01 2.970e+01 3.748e+01, threshold=5.350e+01, percent-clipped=0.0 2023-12-21 12:45:57,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.02 vs. limit=22.5 2023-12-21 12:46:08,313 INFO [train.py:886] (3/4) Epoch 2, batch 1450, loss[loss=0.01769, audio_tagging_loss=0.01769, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 4940891.91 frames. ], batch size: 100, lr: 3.87e-02, grad_scale: 128.0 2023-12-21 12:46:09,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=41440.0, ans=0.2 2023-12-21 12:46:15,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=43.75 vs. limit=15.0 2023-12-21 12:46:22,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=41506.666666666664, ans=0.2 2023-12-21 12:46:36,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-21 12:46:37,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=41573.333333333336, ans=0.0 2023-12-21 12:46:38,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2023-12-21 12:46:42,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=41640.0, ans=0.125 2023-12-21 12:46:51,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-21 12:47:01,208 INFO [train.py:886] (3/4) Epoch 2, batch 1500, loss[loss=0.01933, audio_tagging_loss=0.01933, over 24750.00 frames. ], tot_loss[loss=0.01867, audio_tagging_loss=0.01867, over 4945453.30 frames. ], batch size: 99, lr: 3.87e-02, grad_scale: 256.0 2023-12-21 12:47:01,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=41773.333333333336, ans=0.0 2023-12-21 12:47:12,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=41840.0, ans=0.125 2023-12-21 12:47:37,916 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.550e+01 2.764e+01 3.124e+01 4.346e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 12:47:40,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-21 12:47:41,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42040.0, ans=0.1 2023-12-21 12:47:52,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42106.666666666664, ans=0.125 2023-12-21 12:47:52,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=42106.666666666664, ans=0.125 2023-12-21 12:47:52,767 INFO [train.py:886] (3/4) Epoch 2, batch 1550, loss[loss=0.01581, audio_tagging_loss=0.01581, over 24750.00 frames. ], tot_loss[loss=0.01874, audio_tagging_loss=0.01874, over 4944940.14 frames. ], batch size: 99, lr: 3.86e-02, grad_scale: 256.0 2023-12-21 12:47:53,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=42106.666666666664, ans=0.0 2023-12-21 12:47:54,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=42106.666666666664, ans=0.2 2023-12-21 12:47:56,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.60 vs. limit=10.0 2023-12-21 12:47:57,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-12-21 12:48:08,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=42173.333333333336, ans=0.0 2023-12-21 12:48:09,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=42173.333333333336, ans=10.0 2023-12-21 12:48:12,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2023-12-21 12:48:13,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=20.22 vs. limit=15.0 2023-12-21 12:48:39,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=42373.333333333336, ans=0.0 2023-12-21 12:48:41,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=42373.333333333336, ans=0.0016579710144927536 2023-12-21 12:48:43,466 INFO [train.py:886] (3/4) Epoch 2, batch 1600, loss[loss=0.01854, audio_tagging_loss=0.01854, over 24038.00 frames. ], tot_loss[loss=0.01881, audio_tagging_loss=0.01881, over 4940948.75 frames. ], batch size: 100, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:48:49,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=42440.0, ans=0.125 2023-12-21 12:48:59,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-12-21 12:49:01,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=42506.666666666664, ans=0.125 2023-12-21 12:49:09,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=42573.333333333336, ans=0.0 2023-12-21 12:49:21,555 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.625e+01 2.827e+01 3.147e+01 4.034e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-21 12:49:21,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=42640.0, ans=0.125 2023-12-21 12:49:25,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=42640.0, ans=0.001599999999999999 2023-12-21 12:49:36,917 INFO [train.py:886] (3/4) Epoch 2, batch 1650, loss[loss=0.02173, audio_tagging_loss=0.02173, over 22173.00 frames. ], tot_loss[loss=0.01882, audio_tagging_loss=0.01882, over 4937613.77 frames. ], batch size: 107, lr: 3.85e-02, grad_scale: 256.0 2023-12-21 12:49:47,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=15.0 2023-12-21 12:49:47,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42840.0, ans=0.1 2023-12-21 12:49:55,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=42840.0, ans=0.0015565217391304339 2023-12-21 12:50:03,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=42906.666666666664, ans=0.125 2023-12-21 12:50:14,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=42973.333333333336, ans=0.0015275362318840583 2023-12-21 12:50:29,849 INFO [train.py:886] (3/4) Epoch 2, batch 1700, loss[loss=0.01851, audio_tagging_loss=0.01851, over 24750.00 frames. ], tot_loss[loss=0.01872, audio_tagging_loss=0.01872, over 4945123.66 frames. ], batch size: 99, lr: 3.84e-02, grad_scale: 256.0 2023-12-21 12:50:40,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=43173.333333333336, ans=0.05 2023-12-21 12:50:40,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2023-12-21 12:50:42,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=43173.333333333336, ans=0.0 2023-12-21 12:50:43,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43173.333333333336, ans=0.1 2023-12-21 12:50:43,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=12.0 2023-12-21 12:50:44,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=43173.333333333336, ans=0.125 2023-12-21 12:50:49,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-21 12:50:54,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=43240.0, ans=0.125 2023-12-21 12:51:07,470 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.509e+01 2.799e+01 3.084e+01 4.189e+01, threshold=5.598e+01, percent-clipped=0.0 2023-12-21 12:51:08,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=43306.666666666664, ans=0.2 2023-12-21 12:51:18,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=43373.333333333336, ans=0.001440579710144928 2023-12-21 12:51:21,601 INFO [train.py:886] (3/4) Epoch 2, batch 1750, loss[loss=0.02017, audio_tagging_loss=0.02017, over 24750.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4947966.33 frames. ], batch size: 99, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:51:27,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-12-21 12:51:28,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=43440.0, ans=0.125 2023-12-21 12:51:42,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-21 12:52:14,317 INFO [train.py:886] (3/4) Epoch 2, batch 1800, loss[loss=0.01707, audio_tagging_loss=0.01707, over 25000.00 frames. ], tot_loss[loss=0.01855, audio_tagging_loss=0.01855, over 4954316.84 frames. ], batch size: 100, lr: 3.83e-02, grad_scale: 256.0 2023-12-21 12:52:23,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=43773.333333333336, ans=0.0013536231884057962 2023-12-21 12:52:24,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=43840.0, ans=0.125 2023-12-21 12:52:40,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-21 12:52:51,539 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.458e+01 2.715e+01 2.989e+01 4.266e+01, threshold=5.430e+01, percent-clipped=0.0 2023-12-21 12:52:52,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=43973.333333333336, ans=0.0 2023-12-21 12:52:57,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-12-21 12:53:05,776 INFO [train.py:886] (3/4) Epoch 2, batch 1850, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 4959163.52 frames. ], batch size: 99, lr: 3.82e-02, grad_scale: 256.0 2023-12-21 12:53:14,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=44106.666666666664, ans=0.025 2023-12-21 12:53:16,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=15.0 2023-12-21 12:53:28,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=15.0 2023-12-21 12:53:29,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=44240.0, ans=0.035 2023-12-21 12:53:35,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=44240.0, ans=0.125 2023-12-21 12:53:49,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=44373.333333333336, ans=0.5 2023-12-21 12:53:51,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=44373.333333333336, ans=0.125 2023-12-21 12:53:59,537 INFO [train.py:886] (3/4) Epoch 2, batch 1900, loss[loss=0.01899, audio_tagging_loss=0.01899, over 21866.00 frames. ], tot_loss[loss=0.01867, audio_tagging_loss=0.01867, over 4947976.85 frames. ], batch size: 107, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:54:01,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-12-21 12:54:16,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=44506.666666666664, ans=0.125 2023-12-21 12:54:16,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-12-21 12:54:17,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.78 vs. limit=10.0 2023-12-21 12:54:20,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44573.333333333336, ans=0.1 2023-12-21 12:54:28,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=44573.333333333336, ans=0.5 2023-12-21 12:54:29,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=12.0 2023-12-21 12:54:36,232 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.600e+01 2.818e+01 3.089e+01 5.483e+01, threshold=5.636e+01, percent-clipped=1.0 2023-12-21 12:54:38,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=44640.0, ans=0.2 2023-12-21 12:54:48,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=15.0 2023-12-21 12:54:51,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=44773.333333333336, ans=0.2 2023-12-21 12:54:52,170 INFO [train.py:886] (3/4) Epoch 2, batch 1950, loss[loss=0.01766, audio_tagging_loss=0.01766, over 24750.00 frames. ], tot_loss[loss=0.01856, audio_tagging_loss=0.01856, over 4937762.55 frames. ], batch size: 99, lr: 3.81e-02, grad_scale: 256.0 2023-12-21 12:55:01,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-21 12:55:07,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44840.0, ans=0.1 2023-12-21 12:55:08,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=44840.0, ans=0.125 2023-12-21 12:55:19,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=44906.666666666664, ans=0.125 2023-12-21 12:55:20,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=12.0 2023-12-21 12:55:44,287 INFO [train.py:886] (3/4) Epoch 2, batch 2000, loss[loss=0.02038, audio_tagging_loss=0.02038, over 24750.00 frames. ], tot_loss[loss=0.01844, audio_tagging_loss=0.01844, over 4936434.15 frames. ], batch size: 99, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:56:20,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45306.666666666664, ans=0.1 2023-12-21 12:56:23,060 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.490e+01 2.748e+01 3.106e+01 5.965e+01, threshold=5.495e+01, percent-clipped=1.0 2023-12-21 12:56:25,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.97 vs. limit=22.5 2023-12-21 12:56:38,160 INFO [train.py:886] (3/4) Epoch 2, batch 2050, loss[loss=0.01828, audio_tagging_loss=0.01828, over 24750.00 frames. ], tot_loss[loss=0.01835, audio_tagging_loss=0.01835, over 4941778.52 frames. ], batch size: 99, lr: 3.80e-02, grad_scale: 256.0 2023-12-21 12:56:46,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=45440.0, ans=15.0 2023-12-21 12:57:00,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=45573.333333333336, ans=0.125 2023-12-21 12:57:04,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=45573.333333333336, ans=0.125 2023-12-21 12:57:10,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=45640.0, ans=0.125 2023-12-21 12:57:15,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45640.0, ans=0.125 2023-12-21 12:57:18,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=45640.0, ans=0.0 2023-12-21 12:57:19,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45706.666666666664, ans=0.1 2023-12-21 12:57:21,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-12-21 12:57:31,687 INFO [train.py:886] (3/4) Epoch 2, batch 2100, loss[loss=0.01699, audio_tagging_loss=0.01699, over 25000.00 frames. ], tot_loss[loss=0.01834, audio_tagging_loss=0.01834, over 4947430.15 frames. ], batch size: 100, lr: 3.79e-02, grad_scale: 256.0 2023-12-21 12:57:57,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-21 12:58:00,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=15.0 2023-12-21 12:58:10,346 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.523e+01 2.813e+01 3.062e+01 4.027e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 12:58:13,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.23 vs. limit=22.5 2023-12-21 12:58:23,689 INFO [train.py:886] (3/4) Epoch 2, batch 2150, loss[loss=0.017, audio_tagging_loss=0.017, over 24750.00 frames. ], tot_loss[loss=0.01835, audio_tagging_loss=0.01835, over 4955787.46 frames. ], batch size: 99, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:58:53,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46240.0, ans=0.125 2023-12-21 12:59:02,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=46306.666666666664, ans=0.0 2023-12-21 12:59:16,521 INFO [train.py:886] (3/4) Epoch 2, batch 2200, loss[loss=0.01966, audio_tagging_loss=0.01966, over 24750.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 4955832.26 frames. ], batch size: 99, lr: 3.78e-02, grad_scale: 256.0 2023-12-21 12:59:26,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=46506.666666666664, ans=0.0007594202898550725 2023-12-21 12:59:31,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=46506.666666666664, ans=0.125 2023-12-21 12:59:32,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=46506.666666666664, ans=0.125 2023-12-21 12:59:42,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46573.333333333336, ans=0.0 2023-12-21 12:59:54,647 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.553e+01 2.739e+01 3.029e+01 4.205e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 13:00:05,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=46706.666666666664, ans=0.2 2023-12-21 13:00:06,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-12-21 13:00:09,460 INFO [train.py:886] (3/4) Epoch 2, batch 2250, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.01851, audio_tagging_loss=0.01851, over 4950589.04 frames. ], batch size: 100, lr: 3.77e-02, grad_scale: 256.0 2023-12-21 13:00:15,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=46773.333333333336, ans=0.125 2023-12-21 13:00:23,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=46840.0, ans=0.0 2023-12-21 13:00:32,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=46906.666666666664, ans=0.125 2023-12-21 13:00:34,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46906.666666666664, ans=0.125 2023-12-21 13:00:50,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46973.333333333336, ans=0.125 2023-12-21 13:00:50,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.03 vs. limit=22.5 2023-12-21 13:00:56,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=47040.0, ans=0.0 2023-12-21 13:00:57,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=47040.0, ans=0.125 2023-12-21 13:01:01,794 INFO [train.py:886] (3/4) Epoch 2, batch 2300, loss[loss=0.01705, audio_tagging_loss=0.01705, over 25000.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 4944799.12 frames. ], batch size: 100, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:01:03,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=47106.666666666664, ans=0.0006289855072463772 2023-12-21 13:01:09,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=12.0 2023-12-21 13:01:17,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47173.333333333336, ans=0.125 2023-12-21 13:01:34,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-12-21 13:01:38,828 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.499e+01 2.770e+01 3.074e+01 4.050e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:01:46,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.42 vs. limit=22.5 2023-12-21 13:01:54,318 INFO [train.py:886] (3/4) Epoch 2, batch 2350, loss[loss=0.01953, audio_tagging_loss=0.01953, over 24750.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 4950738.53 frames. ], batch size: 99, lr: 3.76e-02, grad_scale: 256.0 2023-12-21 13:01:57,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47440.0, ans=0.125 2023-12-21 13:01:59,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=47440.0, ans=0.09899494936611666 2023-12-21 13:02:04,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47506.666666666664, ans=0.1 2023-12-21 13:02:13,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=47573.333333333336, ans=0.125 2023-12-21 13:02:45,214 INFO [train.py:886] (3/4) Epoch 2, batch 2400, loss[loss=0.01909, audio_tagging_loss=0.01909, over 25000.00 frames. ], tot_loss[loss=0.01822, audio_tagging_loss=0.01822, over 4951470.65 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:02:46,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-21 13:02:47,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-12-21 13:02:56,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.06 vs. limit=22.5 2023-12-21 13:03:00,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-12-21 13:03:06,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=47906.666666666664, ans=0.2 2023-12-21 13:03:13,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2023-12-21 13:03:16,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=15.0 2023-12-21 13:03:18,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=47973.333333333336, ans=0.125 2023-12-21 13:03:18,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47973.333333333336, ans=0.0 2023-12-21 13:03:21,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=15.0 2023-12-21 13:03:22,328 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.458e+01 2.728e+01 3.033e+01 4.100e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 13:03:24,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47973.333333333336, ans=0.125 2023-12-21 13:03:29,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=15.0 2023-12-21 13:03:30,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48040.0, ans=0.1 2023-12-21 13:03:37,945 INFO [train.py:886] (3/4) Epoch 2, batch 2450, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01827, audio_tagging_loss=0.01827, over 4951529.68 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 256.0 2023-12-21 13:03:38,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=48106.666666666664, ans=0.125 2023-12-21 13:03:42,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=48106.666666666664, ans=0.125 2023-12-21 13:03:46,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-12-21 13:03:52,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=48173.333333333336, ans=0.125 2023-12-21 13:04:16,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48306.666666666664, ans=0.1 2023-12-21 13:04:20,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48373.333333333336, ans=0.1 2023-12-21 13:04:26,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2023-12-21 13:04:30,283 INFO [train.py:886] (3/4) Epoch 2, batch 2500, loss[loss=0.01814, audio_tagging_loss=0.01814, over 24750.00 frames. ], tot_loss[loss=0.01846, audio_tagging_loss=0.01846, over 4942570.50 frames. ], batch size: 99, lr: 3.74e-02, grad_scale: 256.0 2023-12-21 13:04:45,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=48506.666666666664, ans=0.95 2023-12-21 13:04:45,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48506.666666666664, ans=0.1 2023-12-21 13:05:08,423 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 2.552e+01 2.788e+01 3.039e+01 3.953e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 13:05:17,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=48706.666666666664, ans=0.09899494936611666 2023-12-21 13:05:21,978 INFO [train.py:886] (3/4) Epoch 2, batch 2550, loss[loss=0.02318, audio_tagging_loss=0.02318, over 24941.00 frames. ], tot_loss[loss=0.01851, audio_tagging_loss=0.01851, over 4938637.70 frames. ], batch size: 100, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:05:23,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=48773.333333333336, ans=0.125 2023-12-21 13:05:27,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-12-21 13:05:27,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.47 vs. limit=22.5 2023-12-21 13:05:32,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.40 vs. limit=22.5 2023-12-21 13:05:34,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48840.0, ans=0.125 2023-12-21 13:05:42,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-12-21 13:05:47,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=48906.666666666664, ans=0.00023768115942028947 2023-12-21 13:06:03,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2023-12-21 13:06:16,097 INFO [train.py:886] (3/4) Epoch 2, batch 2600, loss[loss=0.01746, audio_tagging_loss=0.01746, over 24750.00 frames. ], tot_loss[loss=0.01843, audio_tagging_loss=0.01843, over 4940791.46 frames. ], batch size: 99, lr: 3.73e-02, grad_scale: 256.0 2023-12-21 13:06:31,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=49173.333333333336, ans=0.2 2023-12-21 13:06:44,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=49240.0, ans=0.0 2023-12-21 13:06:51,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=49306.666666666664, ans=0.00015072463768115926 2023-12-21 13:06:53,173 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.505e+01 2.770e+01 3.074e+01 4.443e+01, threshold=5.539e+01, percent-clipped=0.0 2023-12-21 13:06:55,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=49306.666666666664, ans=0.125 2023-12-21 13:07:07,307 INFO [train.py:886] (3/4) Epoch 2, batch 2650, loss[loss=0.01629, audio_tagging_loss=0.01629, over 25000.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4943381.47 frames. ], batch size: 100, lr: 3.72e-02, grad_scale: 256.0 2023-12-21 13:07:07,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49440.0, ans=0.1 2023-12-21 13:07:15,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=49440.0, ans=0.00012173913043478195 2023-12-21 13:07:26,416 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:07:26,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2023-12-21 13:07:42,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.49 vs. limit=15.0 2023-12-21 13:07:44,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=49640.0, ans=0.125 2023-12-21 13:07:48,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=49640.0, ans=0.0 2023-12-21 13:08:00,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=49773.333333333336, ans=0.125 2023-12-21 13:08:00,852 INFO [train.py:886] (3/4) Epoch 2, batch 2700, loss[loss=0.016, audio_tagging_loss=0.016, over 25000.00 frames. ], tot_loss[loss=0.01837, audio_tagging_loss=0.01837, over 4946579.17 frames. ], batch size: 100, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:08:07,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=49773.333333333336, ans=0.125 2023-12-21 13:08:20,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2023-12-21 13:08:20,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=49840.0, ans=15.0 2023-12-21 13:08:38,470 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.555e+01 2.825e+01 3.141e+01 4.056e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 13:08:53,327 INFO [train.py:886] (3/4) Epoch 2, batch 2750, loss[loss=0.01613, audio_tagging_loss=0.01613, over 24750.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4940870.77 frames. ], batch size: 99, lr: 3.71e-02, grad_scale: 256.0 2023-12-21 13:09:06,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50173.333333333336, ans=0.1 2023-12-21 13:09:08,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=50173.333333333336, ans=0.125 2023-12-21 13:09:14,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50240.0, ans=0.1 2023-12-21 13:09:25,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=15.0 2023-12-21 13:09:38,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50373.333333333336, ans=0.1 2023-12-21 13:09:45,136 INFO [train.py:886] (3/4) Epoch 2, batch 2800, loss[loss=0.02424, audio_tagging_loss=0.02424, over 24750.00 frames. ], tot_loss[loss=0.01833, audio_tagging_loss=0.01833, over 4941036.74 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:09:49,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=50440.0, ans=0.09899494936611666 2023-12-21 13:09:50,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=50440.0, ans=0.125 2023-12-21 13:09:57,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=12.0 2023-12-21 13:10:04,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=50506.666666666664, ans=0.0 2023-12-21 13:10:13,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-21 13:10:23,042 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.547e+01 2.749e+01 3.005e+01 4.633e+01, threshold=5.497e+01, percent-clipped=0.0 2023-12-21 13:10:26,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.42 vs. limit=22.5 2023-12-21 13:10:28,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=50706.666666666664, ans=0.0 2023-12-21 13:10:32,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=50706.666666666664, ans=0.125 2023-12-21 13:10:34,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=50706.666666666664, ans=0.0 2023-12-21 13:10:35,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=50706.666666666664, ans=0.125 2023-12-21 13:10:38,448 INFO [train.py:886] (3/4) Epoch 2, batch 2850, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24750.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4940543.07 frames. ], batch size: 99, lr: 3.70e-02, grad_scale: 256.0 2023-12-21 13:10:40,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50773.333333333336, ans=0.125 2023-12-21 13:10:44,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.24 vs. limit=15.0 2023-12-21 13:10:46,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50773.333333333336, ans=0.1 2023-12-21 13:10:55,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=15.0 2023-12-21 13:10:57,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=50840.0, ans=0.125 2023-12-21 13:11:08,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-12-21 13:11:30,668 INFO [train.py:886] (3/4) Epoch 2, batch 2900, loss[loss=0.02031, audio_tagging_loss=0.02031, over 25000.00 frames. ], tot_loss[loss=0.0183, audio_tagging_loss=0.0183, over 4938691.87 frames. ], batch size: 100, lr: 3.69e-02, grad_scale: 256.0 2023-12-21 13:11:35,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=51106.666666666664, ans=0.2 2023-12-21 13:12:08,843 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.551e+01 2.842e+01 3.147e+01 4.281e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-21 13:12:22,986 INFO [train.py:886] (3/4) Epoch 2, batch 2950, loss[loss=0.01946, audio_tagging_loss=0.01946, over 24750.00 frames. ], tot_loss[loss=0.01819, audio_tagging_loss=0.01819, over 4945645.68 frames. ], batch size: 99, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:12:34,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=51506.666666666664, ans=0.125 2023-12-21 13:12:40,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=15.0 2023-12-21 13:12:48,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51573.333333333336, ans=0.125 2023-12-21 13:12:49,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2023-12-21 13:12:53,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-12-21 13:12:55,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-21 13:13:15,801 INFO [train.py:886] (3/4) Epoch 2, batch 3000, loss[loss=0.01802, audio_tagging_loss=0.01802, over 25000.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4947078.42 frames. ], batch size: 100, lr: 3.68e-02, grad_scale: 256.0 2023-12-21 13:13:15,801 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 13:13:38,878 INFO [train.py:917] (3/4) Epoch 2, validation: loss=0.04373, audio_tagging_loss=0.04373, over 3737520.00 frames. 2023-12-21 13:13:38,879 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 13:13:49,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=51840.0, ans=0.2 2023-12-21 13:14:01,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=15.0 2023-12-21 13:14:05,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.36 vs. limit=22.5 2023-12-21 13:14:16,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.493e+01 2.750e+01 3.115e+01 4.237e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 13:14:22,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=52040.0, ans=0.0 2023-12-21 13:14:29,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.57 vs. limit=22.5 2023-12-21 13:14:31,123 INFO [train.py:886] (3/4) Epoch 2, batch 3050, loss[loss=0.01592, audio_tagging_loss=0.01592, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4949843.58 frames. ], batch size: 99, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:14:32,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=52106.666666666664, ans=0.125 2023-12-21 13:14:32,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2023-12-21 13:14:51,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=52240.0, ans=0.2 2023-12-21 13:14:53,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=52240.0, ans=0.0 2023-12-21 13:15:09,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=52306.666666666664, ans=0.0 2023-12-21 13:15:24,022 INFO [train.py:886] (3/4) Epoch 2, batch 3100, loss[loss=0.0178, audio_tagging_loss=0.0178, over 24750.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4957890.32 frames. ], batch size: 99, lr: 3.67e-02, grad_scale: 256.0 2023-12-21 13:15:29,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2023-12-21 13:15:31,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=52440.0, ans=0.125 2023-12-21 13:15:53,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=52573.333333333336, ans=0.0 2023-12-21 13:16:01,673 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.616e+01 2.830e+01 3.122e+01 4.076e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 13:16:15,772 INFO [train.py:886] (3/4) Epoch 2, batch 3150, loss[loss=0.0171, audio_tagging_loss=0.0171, over 24750.00 frames. ], tot_loss[loss=0.01825, audio_tagging_loss=0.01825, over 4949018.72 frames. ], batch size: 99, lr: 3.66e-02, grad_scale: 256.0 2023-12-21 13:16:33,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.26 vs. limit=22.5 2023-12-21 13:16:54,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=15.0 2023-12-21 13:16:59,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=53040.0, ans=0.95 2023-12-21 13:17:01,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=53040.0, ans=0.125 2023-12-21 13:17:08,690 INFO [train.py:886] (3/4) Epoch 2, batch 3200, loss[loss=0.01907, audio_tagging_loss=0.01907, over 25000.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4944082.16 frames. ], batch size: 100, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:17:16,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2023-12-21 13:17:31,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=53240.0, ans=0.95 2023-12-21 13:17:33,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-21 13:17:40,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=53306.666666666664, ans=0.2 2023-12-21 13:17:43,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53306.666666666664, ans=0.1 2023-12-21 13:17:48,108 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.552e+01 2.741e+01 3.134e+01 4.308e+01, threshold=5.481e+01, percent-clipped=0.0 2023-12-21 13:17:56,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=53373.333333333336, ans=10.0 2023-12-21 13:17:57,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-12-21 13:17:58,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53373.333333333336, ans=0.1 2023-12-21 13:18:00,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=53373.333333333336, ans=0.125 2023-12-21 13:18:04,663 INFO [train.py:886] (3/4) Epoch 2, batch 3250, loss[loss=0.01675, audio_tagging_loss=0.01675, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4940946.00 frames. ], batch size: 100, lr: 3.65e-02, grad_scale: 256.0 2023-12-21 13:18:43,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=53640.0, ans=0.125 2023-12-21 13:18:55,719 INFO [train.py:886] (3/4) Epoch 2, batch 3300, loss[loss=0.02224, audio_tagging_loss=0.02224, over 24921.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4943353.74 frames. ], batch size: 100, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:19:01,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=53773.333333333336, ans=0.125 2023-12-21 13:19:01,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-12-21 13:19:02,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53773.333333333336, ans=0.125 2023-12-21 13:19:03,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.04 vs. limit=22.5 2023-12-21 13:19:14,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=53840.0, ans=0.125 2023-12-21 13:19:14,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.92 vs. limit=22.5 2023-12-21 13:19:23,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=15.0 2023-12-21 13:19:24,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=53906.666666666664, ans=0.125 2023-12-21 13:19:27,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2023-12-21 13:19:30,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53973.333333333336, ans=0.1 2023-12-21 13:19:31,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53973.333333333336, ans=0.1 2023-12-21 13:19:32,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-12-21 13:19:35,085 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.487e+01 2.709e+01 2.952e+01 3.963e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 13:19:44,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=54040.0, ans=0.0 2023-12-21 13:19:49,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-21 13:19:50,316 INFO [train.py:886] (3/4) Epoch 2, batch 3350, loss[loss=0.01898, audio_tagging_loss=0.01898, over 25000.00 frames. ], tot_loss[loss=0.0181, audio_tagging_loss=0.0181, over 4948002.09 frames. ], batch size: 100, lr: 3.64e-02, grad_scale: 256.0 2023-12-21 13:19:56,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54106.666666666664, ans=0.125 2023-12-21 13:20:18,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=54240.0, ans=0.125 2023-12-21 13:20:33,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54373.333333333336, ans=0.1 2023-12-21 13:20:37,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=54373.333333333336, ans=0.125 2023-12-21 13:20:41,765 INFO [train.py:886] (3/4) Epoch 2, batch 3400, loss[loss=0.02179, audio_tagging_loss=0.02179, over 24750.00 frames. ], tot_loss[loss=0.01819, audio_tagging_loss=0.01819, over 4940220.79 frames. ], batch size: 99, lr: 3.63e-02, grad_scale: 256.0 2023-12-21 13:20:56,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=54506.666666666664, ans=0.125 2023-12-21 13:21:12,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54640.0, ans=0.125 2023-12-21 13:21:20,900 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.559e+01 2.794e+01 3.054e+01 3.708e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 13:21:22,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=54640.0, ans=0.0 2023-12-21 13:21:33,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2023-12-21 13:21:33,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-12-21 13:21:33,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=54773.333333333336, ans=0.0 2023-12-21 13:21:34,418 INFO [train.py:886] (3/4) Epoch 2, batch 3450, loss[loss=0.02203, audio_tagging_loss=0.02203, over 24750.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 4942206.30 frames. ], batch size: 99, lr: 3.62e-02, grad_scale: 256.0 2023-12-21 13:21:40,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=54773.333333333336, ans=0.0 2023-12-21 13:21:40,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=54773.333333333336, ans=10.0 2023-12-21 13:21:52,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.12 vs. limit=10.0 2023-12-21 13:21:56,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=54906.666666666664, ans=0.125 2023-12-21 13:22:05,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=54973.333333333336, ans=10.0 2023-12-21 13:22:13,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-21 13:22:28,231 INFO [train.py:886] (3/4) Epoch 2, batch 3500, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24750.00 frames. ], tot_loss[loss=0.01847, audio_tagging_loss=0.01847, over 4937353.96 frames. ], batch size: 99, lr: 3.62e-02, grad_scale: 512.0 2023-12-21 13:22:30,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55106.666666666664, ans=0.1 2023-12-21 13:22:52,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=55240.0, ans=0.0 2023-12-21 13:23:05,437 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.571e+01 2.817e+01 3.195e+01 5.368e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 13:23:10,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55373.333333333336, ans=0.0 2023-12-21 13:23:10,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.99 vs. limit=22.5 2023-12-21 13:23:19,479 INFO [train.py:886] (3/4) Epoch 2, batch 3550, loss[loss=0.01803, audio_tagging_loss=0.01803, over 24750.00 frames. ], tot_loss[loss=0.0184, audio_tagging_loss=0.0184, over 4940230.99 frames. ], batch size: 99, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:23:20,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-12-21 13:23:38,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=17.26 vs. limit=15.0 2023-12-21 13:23:46,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=12.0 2023-12-21 13:23:46,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55573.333333333336, ans=0.125 2023-12-21 13:24:02,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2023-12-21 13:24:11,732 INFO [train.py:886] (3/4) Epoch 2, batch 3600, loss[loss=0.01736, audio_tagging_loss=0.01736, over 25000.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 4950185.71 frames. ], batch size: 100, lr: 3.61e-02, grad_scale: 512.0 2023-12-21 13:24:14,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=55773.333333333336, ans=0.0 2023-12-21 13:24:20,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55840.0, ans=0.1 2023-12-21 13:24:21,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55840.0, ans=0.1 2023-12-21 13:24:40,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2023-12-21 13:24:43,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=55973.333333333336, ans=0.0 2023-12-21 13:24:50,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.558e+01 2.810e+01 3.070e+01 4.011e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 13:24:50,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=55973.333333333336, ans=0.125 2023-12-21 13:24:52,765 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.753e+01 2023-12-21 13:24:58,169 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.633e+01 2023-12-21 13:25:02,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2023-12-21 13:25:04,354 INFO [train.py:886] (3/4) Epoch 2, batch 3650, loss[loss=0.01943, audio_tagging_loss=0.01943, over 25000.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4949545.93 frames. ], batch size: 100, lr: 3.60e-02, grad_scale: 256.0 2023-12-21 13:25:36,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=56306.666666666664, ans=0.1 2023-12-21 13:25:56,787 INFO [train.py:886] (3/4) Epoch 2, batch 3700, loss[loss=0.01876, audio_tagging_loss=0.01876, over 25000.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4952648.40 frames. ], batch size: 100, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:26:13,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56506.666666666664, ans=0.1 2023-12-21 13:26:30,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=56640.0, ans=0.125 2023-12-21 13:26:32,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2023-12-21 13:26:35,078 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.535e+01 2.837e+01 3.085e+01 3.878e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 13:26:46,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=56706.666666666664, ans=0.125 2023-12-21 13:26:50,306 INFO [train.py:886] (3/4) Epoch 2, batch 3750, loss[loss=0.01703, audio_tagging_loss=0.01703, over 24750.00 frames. ], tot_loss[loss=0.0181, audio_tagging_loss=0.0181, over 4951638.76 frames. ], batch size: 99, lr: 3.59e-02, grad_scale: 256.0 2023-12-21 13:27:17,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56906.666666666664, ans=0.1 2023-12-21 13:27:26,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=56973.333333333336, ans=0.125 2023-12-21 13:27:39,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2023-12-21 13:27:41,333 INFO [train.py:886] (3/4) Epoch 2, batch 3800, loss[loss=0.01755, audio_tagging_loss=0.01755, over 24750.00 frames. ], tot_loss[loss=0.01815, audio_tagging_loss=0.01815, over 4951178.81 frames. ], batch size: 99, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:27:52,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2023-12-21 13:27:56,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=57173.333333333336, ans=0.125 2023-12-21 13:28:00,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=57173.333333333336, ans=0.125 2023-12-21 13:28:02,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-21 13:28:03,840 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.933e+01 2023-12-21 13:28:13,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2023-12-21 13:28:20,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.558e+01 2.812e+01 3.070e+01 5.505e+01, threshold=5.625e+01, percent-clipped=0.0 2023-12-21 13:28:21,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=57306.666666666664, ans=0.0 2023-12-21 13:28:21,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.09 vs. limit=6.0 2023-12-21 13:28:32,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-12-21 13:28:32,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.42 vs. limit=22.5 2023-12-21 13:28:34,234 INFO [train.py:886] (3/4) Epoch 2, batch 3850, loss[loss=0.0175, audio_tagging_loss=0.0175, over 24750.00 frames. ], tot_loss[loss=0.01812, audio_tagging_loss=0.01812, over 4948982.60 frames. ], batch size: 99, lr: 3.58e-02, grad_scale: 256.0 2023-12-21 13:29:08,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-12-21 13:29:10,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-12-21 13:29:11,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=57640.0, ans=0.07 2023-12-21 13:29:27,432 INFO [train.py:886] (3/4) Epoch 2, batch 3900, loss[loss=0.02005, audio_tagging_loss=0.02005, over 24750.00 frames. ], tot_loss[loss=0.01805, audio_tagging_loss=0.01805, over 4949170.06 frames. ], batch size: 99, lr: 3.57e-02, grad_scale: 256.0 2023-12-21 13:29:34,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=57773.333333333336, ans=0.0 2023-12-21 13:29:37,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=57840.0, ans=0.125 2023-12-21 13:29:46,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.81 vs. limit=22.5 2023-12-21 13:30:01,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57973.333333333336, ans=0.125 2023-12-21 13:30:04,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57973.333333333336, ans=0.125 2023-12-21 13:30:05,871 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.478e+01 2.658e+01 2.976e+01 3.993e+01, threshold=5.317e+01, percent-clipped=0.0 2023-12-21 13:30:10,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=15.0 2023-12-21 13:30:19,061 INFO [train.py:886] (3/4) Epoch 2, batch 3950, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4955547.61 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:30:30,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-12-21 13:30:35,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2023-12-21 13:30:38,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.76 vs. limit=22.5 2023-12-21 13:30:53,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-21 13:30:57,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=58306.666666666664, ans=0.1 2023-12-21 13:31:03,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=58373.333333333336, ans=0.125 2023-12-21 13:31:12,217 INFO [train.py:886] (3/4) Epoch 2, batch 4000, loss[loss=0.01629, audio_tagging_loss=0.01629, over 25000.00 frames. ], tot_loss[loss=0.01813, audio_tagging_loss=0.01813, over 4963752.75 frames. ], batch size: 100, lr: 3.56e-02, grad_scale: 256.0 2023-12-21 13:31:19,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=58440.0, ans=0.125 2023-12-21 13:31:30,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=58506.666666666664, ans=0.125 2023-12-21 13:31:50,754 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.643e+01 2.871e+01 3.262e+01 4.395e+01, threshold=5.743e+01, percent-clipped=0.0 2023-12-21 13:31:57,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=58706.666666666664, ans=0.125 2023-12-21 13:32:04,040 INFO [train.py:886] (3/4) Epoch 2, batch 4050, loss[loss=0.01981, audio_tagging_loss=0.01981, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4957833.16 frames. ], batch size: 99, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:32:06,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.09 vs. limit=10.0 2023-12-21 13:32:20,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58840.0, ans=0.1 2023-12-21 13:32:22,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=58840.0, ans=0.125 2023-12-21 13:32:27,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-12-21 13:32:33,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=58906.666666666664, ans=0.125 2023-12-21 13:32:38,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-21 13:32:48,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=59040.0, ans=0.0 2023-12-21 13:32:56,541 INFO [train.py:886] (3/4) Epoch 2, batch 4100, loss[loss=0.01815, audio_tagging_loss=0.01815, over 24750.00 frames. ], tot_loss[loss=0.01836, audio_tagging_loss=0.01836, over 4952770.77 frames. ], batch size: 99, lr: 3.55e-02, grad_scale: 256.0 2023-12-21 13:32:56,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=59106.666666666664, ans=0.0 2023-12-21 13:32:56,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=59106.666666666664, ans=0.0 2023-12-21 13:33:06,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59173.333333333336, ans=0.1 2023-12-21 13:33:13,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=59173.333333333336, ans=0.0 2023-12-21 13:33:17,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=59240.0, ans=0.125 2023-12-21 13:33:34,106 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.565e+01 2.821e+01 3.074e+01 4.312e+01, threshold=5.642e+01, percent-clipped=0.0 2023-12-21 13:33:38,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=59373.333333333336, ans=0.0 2023-12-21 13:33:48,663 INFO [train.py:886] (3/4) Epoch 2, batch 4150, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4948637.11 frames. ], batch size: 100, lr: 3.54e-02, grad_scale: 256.0 2023-12-21 13:34:01,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=12.0 2023-12-21 13:34:02,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=59506.666666666664, ans=0.0 2023-12-21 13:34:03,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=59506.666666666664, ans=0.125 2023-12-21 13:34:11,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=59573.333333333336, ans=0.125 2023-12-21 13:34:12,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=59573.333333333336, ans=0.125 2023-12-21 13:34:15,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=59573.333333333336, ans=0.125 2023-12-21 13:34:27,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=59640.0, ans=0.0 2023-12-21 13:34:40,100 INFO [train.py:886] (3/4) Epoch 2, batch 4200, loss[loss=0.01963, audio_tagging_loss=0.01963, over 25000.00 frames. ], tot_loss[loss=0.0182, audio_tagging_loss=0.0182, over 4947499.93 frames. ], batch size: 100, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:34:40,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59773.333333333336, ans=0.1 2023-12-21 13:34:57,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=59840.0, ans=0.125 2023-12-21 13:35:19,405 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.515e+01 2.712e+01 3.027e+01 3.804e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 13:35:26,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=60040.0, ans=0.125 2023-12-21 13:35:31,820 INFO [train.py:886] (3/4) Epoch 2, batch 4250, loss[loss=0.01964, audio_tagging_loss=0.01964, over 25000.00 frames. ], tot_loss[loss=0.01816, audio_tagging_loss=0.01816, over 4947123.13 frames. ], batch size: 100, lr: 3.53e-02, grad_scale: 256.0 2023-12-21 13:35:41,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-21 13:35:49,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2023-12-21 13:35:53,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60240.0, ans=0.1 2023-12-21 13:35:59,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=60240.0, ans=0.125 2023-12-21 13:36:05,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=60306.666666666664, ans=0.025 2023-12-21 13:36:24,758 INFO [train.py:886] (3/4) Epoch 2, batch 4300, loss[loss=0.01806, audio_tagging_loss=0.01806, over 25000.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4954319.69 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:36:24,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=60440.0, ans=0.125 2023-12-21 13:36:26,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60440.0, ans=0.125 2023-12-21 13:36:37,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=60506.666666666664, ans=0.125 2023-12-21 13:36:52,310 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.550e+01 2023-12-21 13:36:54,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=60640.0, ans=0.0 2023-12-21 13:36:55,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.02 vs. limit=22.5 2023-12-21 13:37:03,036 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.538e+01 2.771e+01 3.049e+01 3.843e+01, threshold=5.542e+01, percent-clipped=0.0 2023-12-21 13:37:03,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=60640.0, ans=0.2 2023-12-21 13:37:10,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=60706.666666666664, ans=0.04949747468305833 2023-12-21 13:37:12,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=60706.666666666664, ans=0.025 2023-12-21 13:37:15,379 INFO [train.py:886] (3/4) Epoch 2, batch 4350, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.0182, audio_tagging_loss=0.0182, over 4959802.46 frames. ], batch size: 100, lr: 3.52e-02, grad_scale: 256.0 2023-12-21 13:37:37,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.88 vs. limit=10.0 2023-12-21 13:37:45,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=60973.333333333336, ans=0.0 2023-12-21 13:37:45,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60973.333333333336, ans=0.1 2023-12-21 13:37:57,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-12-21 13:38:06,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=61040.0, ans=0.0 2023-12-21 13:38:08,514 INFO [train.py:886] (3/4) Epoch 2, batch 4400, loss[loss=0.02039, audio_tagging_loss=0.02039, over 24750.00 frames. ], tot_loss[loss=0.01842, audio_tagging_loss=0.01842, over 4952606.84 frames. ], batch size: 99, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:38:18,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61173.333333333336, ans=0.125 2023-12-21 13:38:21,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-12-21 13:38:46,110 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.594e+01 2.828e+01 3.102e+01 3.980e+01, threshold=5.657e+01, percent-clipped=0.0 2023-12-21 13:38:47,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=61306.666666666664, ans=0.2 2023-12-21 13:38:48,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=61373.333333333336, ans=0.2 2023-12-21 13:38:55,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61373.333333333336, ans=0.125 2023-12-21 13:38:59,965 INFO [train.py:886] (3/4) Epoch 2, batch 4450, loss[loss=0.01616, audio_tagging_loss=0.01616, over 23956.00 frames. ], tot_loss[loss=0.01839, audio_tagging_loss=0.01839, over 4947911.83 frames. ], batch size: 100, lr: 3.51e-02, grad_scale: 256.0 2023-12-21 13:39:18,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=61506.666666666664, ans=0.125 2023-12-21 13:39:19,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=61573.333333333336, ans=0.0 2023-12-21 13:39:23,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:23,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=61573.333333333336, ans=0.125 2023-12-21 13:39:24,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61573.333333333336, ans=0.1 2023-12-21 13:39:31,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=61640.0, ans=0.125 2023-12-21 13:39:38,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=61640.0, ans=0.2 2023-12-21 13:39:43,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-12-21 13:39:51,922 INFO [train.py:886] (3/4) Epoch 2, batch 4500, loss[loss=0.01736, audio_tagging_loss=0.01736, over 25000.00 frames. ], tot_loss[loss=0.01828, audio_tagging_loss=0.01828, over 4948285.72 frames. ], batch size: 100, lr: 3.50e-02, grad_scale: 256.0 2023-12-21 13:40:06,776 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.859e+00 2023-12-21 13:40:10,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=61840.0, ans=10.0 2023-12-21 13:40:10,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=61840.0, ans=0.125 2023-12-21 13:40:22,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=61973.333333333336, ans=0.5 2023-12-21 13:40:29,936 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.527e+01 2.862e+01 3.154e+01 4.163e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-21 13:40:44,702 INFO [train.py:886] (3/4) Epoch 2, batch 4550, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4951872.82 frames. ], batch size: 99, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:41:16,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=62306.666666666664, ans=0.125 2023-12-21 13:41:22,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=62306.666666666664, ans=0.125 2023-12-21 13:41:33,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=62373.333333333336, ans=0.125 2023-12-21 13:41:35,494 INFO [train.py:886] (3/4) Epoch 2, batch 4600, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01813, audio_tagging_loss=0.01813, over 4951384.85 frames. ], batch size: 100, lr: 3.49e-02, grad_scale: 256.0 2023-12-21 13:41:40,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2023-12-21 13:42:05,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=62573.333333333336, ans=0.0 2023-12-21 13:42:15,553 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.477e+01 2.661e+01 2.958e+01 4.591e+01, threshold=5.321e+01, percent-clipped=0.0 2023-12-21 13:42:19,466 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.957e+01 2023-12-21 13:42:25,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-12-21 13:42:29,640 INFO [train.py:886] (3/4) Epoch 2, batch 4650, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01823, audio_tagging_loss=0.01823, over 4958427.45 frames. ], batch size: 100, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:42:30,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=62773.333333333336, ans=0.125 2023-12-21 13:42:33,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-21 13:42:35,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62773.333333333336, ans=0.1 2023-12-21 13:42:49,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=62906.666666666664, ans=0.05 2023-12-21 13:42:55,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=62906.666666666664, ans=0.125 2023-12-21 13:43:06,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.43 vs. limit=22.5 2023-12-21 13:43:19,726 INFO [train.py:886] (3/4) Epoch 2, batch 4700, loss[loss=0.01756, audio_tagging_loss=0.01756, over 24750.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 4950366.59 frames. ], batch size: 99, lr: 3.48e-02, grad_scale: 256.0 2023-12-21 13:43:20,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=15.0 2023-12-21 13:43:35,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-21 13:43:42,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=63240.0, ans=0.0 2023-12-21 13:43:52,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=63306.666666666664, ans=0.0 2023-12-21 13:43:55,353 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.602e+01 2.941e+01 3.244e+01 4.815e+01, threshold=5.882e+01, percent-clipped=0.0 2023-12-21 13:43:58,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63373.333333333336, ans=0.1 2023-12-21 13:44:03,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63373.333333333336, ans=0.1 2023-12-21 13:44:06,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63440.0, ans=0.1 2023-12-21 13:44:07,450 INFO [train.py:886] (3/4) Epoch 2, batch 4750, loss[loss=0.018, audio_tagging_loss=0.018, over 24750.00 frames. ], tot_loss[loss=0.01832, audio_tagging_loss=0.01832, over 4943976.44 frames. ], batch size: 99, lr: 3.47e-02, grad_scale: 256.0 2023-12-21 13:44:11,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-12-21 13:44:38,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=63546.666666666664, ans=0.0 2023-12-21 13:44:45,612 INFO [train.py:886] (3/4) Epoch 3, batch 0, loss[loss=0.04337, audio_tagging_loss=0.04337, over 25000.00 frames. ], tot_loss[loss=0.04337, audio_tagging_loss=0.04337, over 25000.00 frames. ], batch size: 100, lr: 3.30e-02, grad_scale: 256.0 2023-12-21 13:44:45,613 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 13:45:06,834 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8034, 1.6934, 3.0035, 2.3139], device='cuda:3') 2023-12-21 13:45:08,230 INFO [train.py:917] (3/4) Epoch 3, validation: loss=0.04026, audio_tagging_loss=0.04026, over 3737520.00 frames. 2023-12-21 13:45:08,231 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 13:45:09,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=63546.666666666664, ans=0.2 2023-12-21 13:45:24,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=63613.333333333336, ans=0.0 2023-12-21 13:45:35,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.37 vs. limit=10.0 2023-12-21 13:45:40,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=63746.666666666664, ans=0.125 2023-12-21 13:45:44,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=63746.666666666664, ans=0.125 2023-12-21 13:45:44,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-21 13:45:48,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=12.0 2023-12-21 13:46:01,485 INFO [train.py:886] (3/4) Epoch 3, batch 50, loss[loss=0.02291, audio_tagging_loss=0.02291, over 25000.00 frames. ], tot_loss[loss=0.02935, audio_tagging_loss=0.02935, over 1118713.27 frames. ], batch size: 100, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:46:05,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=63880.0, ans=0.125 2023-12-21 13:46:08,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2023-12-21 13:46:14,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=63946.666666666664, ans=0.125 2023-12-21 13:46:15,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=63946.666666666664, ans=0.125 2023-12-21 13:46:16,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=63946.666666666664, ans=0.04949747468305833 2023-12-21 13:46:17,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=15.0 2023-12-21 13:46:23,536 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.959e+01 3.288e+01 3.830e+01 1.189e+02, threshold=6.575e+01, percent-clipped=4.0 2023-12-21 13:46:51,660 INFO [train.py:886] (3/4) Epoch 3, batch 100, loss[loss=0.02264, audio_tagging_loss=0.02264, over 25000.00 frames. ], tot_loss[loss=0.02503, audio_tagging_loss=0.02503, over 1970051.44 frames. ], batch size: 100, lr: 3.29e-02, grad_scale: 64.0 2023-12-21 13:46:53,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-12-21 13:46:54,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2023-12-21 13:47:01,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=64280.0, ans=0.07 2023-12-21 13:47:07,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.59 vs. limit=22.5 2023-12-21 13:47:19,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-21 13:47:44,255 INFO [train.py:886] (3/4) Epoch 3, batch 150, loss[loss=0.01682, audio_tagging_loss=0.01682, over 25000.00 frames. ], tot_loss[loss=0.02285, audio_tagging_loss=0.02285, over 2635520.57 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:47:49,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64546.666666666664, ans=0.1 2023-12-21 13:47:49,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-12-21 13:48:00,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=64613.333333333336, ans=0.2 2023-12-21 13:48:07,594 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.621e+01 2.918e+01 3.112e+01 3.943e+01, threshold=5.836e+01, percent-clipped=0.0 2023-12-21 13:48:08,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-12-21 13:48:12,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.71 vs. limit=10.0 2023-12-21 13:48:28,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=64813.333333333336, ans=0.125 2023-12-21 13:48:35,606 INFO [train.py:886] (3/4) Epoch 3, batch 200, loss[loss=0.01894, audio_tagging_loss=0.01894, over 25000.00 frames. ], tot_loss[loss=0.02141, audio_tagging_loss=0.02141, over 3152284.60 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 64.0 2023-12-21 13:48:38,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=64880.0, ans=0.2 2023-12-21 13:49:13,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=65080.0, ans=0.0 2023-12-21 13:49:28,030 INFO [train.py:886] (3/4) Epoch 3, batch 250, loss[loss=0.01943, audio_tagging_loss=0.01943, over 25000.00 frames. ], tot_loss[loss=0.02026, audio_tagging_loss=0.02026, over 3556638.31 frames. ], batch size: 100, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:49:30,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=65213.333333333336, ans=0.2 2023-12-21 13:49:35,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=15.0 2023-12-21 13:49:48,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=65346.666666666664, ans=0.2 2023-12-21 13:49:50,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=65346.666666666664, ans=0.125 2023-12-21 13:49:51,902 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.523e+01 2.809e+01 3.152e+01 4.163e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-21 13:49:54,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=65346.666666666664, ans=0.125 2023-12-21 13:50:06,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=65413.333333333336, ans=0.04949747468305833 2023-12-21 13:50:07,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=65480.0, ans=0.125 2023-12-21 13:50:11,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=65480.0, ans=0.125 2023-12-21 13:50:13,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65480.0, ans=0.1 2023-12-21 13:50:20,241 INFO [train.py:886] (3/4) Epoch 3, batch 300, loss[loss=0.01864, audio_tagging_loss=0.01864, over 24750.00 frames. ], tot_loss[loss=0.01976, audio_tagging_loss=0.01976, over 3863150.33 frames. ], batch size: 99, lr: 3.27e-02, grad_scale: 64.0 2023-12-21 13:50:36,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=65613.33333333333, ans=0.125 2023-12-21 13:50:58,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=65746.66666666667, ans=0.2 2023-12-21 13:50:58,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65746.66666666667, ans=0.1 2023-12-21 13:51:10,627 INFO [train.py:886] (3/4) Epoch 3, batch 350, loss[loss=0.01623, audio_tagging_loss=0.01623, over 24750.00 frames. ], tot_loss[loss=0.0195, audio_tagging_loss=0.0195, over 4097653.64 frames. ], batch size: 99, lr: 3.26e-02, grad_scale: 64.0 2023-12-21 13:51:19,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=65880.0, ans=0.015 2023-12-21 13:51:19,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=65880.0, ans=0.125 2023-12-21 13:51:24,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-21 13:51:35,294 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.550e+01 2.783e+01 3.112e+01 3.866e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 13:51:35,589 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.346e-01 2023-12-21 13:51:37,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.31 vs. limit=10.0 2023-12-21 13:51:43,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=66080.0, ans=0.0 2023-12-21 13:51:49,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=66080.0, ans=0.04949747468305833 2023-12-21 13:51:56,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=66146.66666666667, ans=0.0 2023-12-21 13:52:02,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66213.33333333333, ans=0.1 2023-12-21 13:52:03,163 INFO [train.py:886] (3/4) Epoch 3, batch 400, loss[loss=0.01858, audio_tagging_loss=0.01858, over 22827.00 frames. ], tot_loss[loss=0.01906, audio_tagging_loss=0.01906, over 4283667.67 frames. ], batch size: 107, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:52:16,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-21 13:52:25,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66346.66666666667, ans=0.125 2023-12-21 13:52:33,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=66413.33333333333, ans=0.035 2023-12-21 13:52:33,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=66413.33333333333, ans=0.04949747468305833 2023-12-21 13:52:39,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=66413.33333333333, ans=0.2 2023-12-21 13:52:53,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66546.66666666667, ans=0.125 2023-12-21 13:52:54,314 INFO [train.py:886] (3/4) Epoch 3, batch 450, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01868, audio_tagging_loss=0.01868, over 4432140.39 frames. ], batch size: 100, lr: 3.25e-02, grad_scale: 64.0 2023-12-21 13:53:04,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-21 13:53:15,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=15.0 2023-12-21 13:53:17,867 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.418e+01 2.694e+01 2.965e+01 4.467e+01, threshold=5.389e+01, percent-clipped=0.0 2023-12-21 13:53:31,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=66746.66666666667, ans=0.5 2023-12-21 13:53:46,578 INFO [train.py:886] (3/4) Epoch 3, batch 500, loss[loss=0.01693, audio_tagging_loss=0.01693, over 24750.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 4547394.85 frames. ], batch size: 99, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:53:57,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=12.0 2023-12-21 13:54:10,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2023-12-21 13:54:38,663 INFO [train.py:886] (3/4) Epoch 3, batch 550, loss[loss=0.01884, audio_tagging_loss=0.01884, over 25000.00 frames. ], tot_loss[loss=0.0184, audio_tagging_loss=0.0184, over 4640817.89 frames. ], batch size: 100, lr: 3.24e-02, grad_scale: 64.0 2023-12-21 13:54:45,528 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.847e+00 2023-12-21 13:54:51,891 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 13:54:53,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=67280.0, ans=0.2 2023-12-21 13:54:55,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=67280.0, ans=0.09899494936611666 2023-12-21 13:55:01,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=67346.66666666667, ans=0.0 2023-12-21 13:55:02,555 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.576e+01 2.807e+01 3.063e+01 3.994e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 13:55:03,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.36 vs. limit=15.0 2023-12-21 13:55:29,814 INFO [train.py:886] (3/4) Epoch 3, batch 600, loss[loss=0.02025, audio_tagging_loss=0.02025, over 24750.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4706635.26 frames. ], batch size: 99, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:55:33,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=67546.66666666667, ans=0.0 2023-12-21 13:55:33,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=67546.66666666667, ans=0.2 2023-12-21 13:55:40,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2023-12-21 13:55:43,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=67613.33333333333, ans=0.125 2023-12-21 13:56:01,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=15.0 2023-12-21 13:56:15,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67813.33333333333, ans=0.125 2023-12-21 13:56:22,130 INFO [train.py:886] (3/4) Epoch 3, batch 650, loss[loss=0.02051, audio_tagging_loss=0.02051, over 24750.00 frames. ], tot_loss[loss=0.01838, audio_tagging_loss=0.01838, over 4752808.35 frames. ], batch size: 99, lr: 3.23e-02, grad_scale: 64.0 2023-12-21 13:56:27,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67880.0, ans=0.0 2023-12-21 13:56:27,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67880.0, ans=0.1 2023-12-21 13:56:28,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=67880.0, ans=0.0 2023-12-21 13:56:37,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=67946.66666666667, ans=0.125 2023-12-21 13:56:39,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=67946.66666666667, ans=0.125 2023-12-21 13:56:43,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=68013.33333333333, ans=0.0 2023-12-21 13:56:44,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=44.20 vs. limit=15.0 2023-12-21 13:56:46,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.96 vs. limit=22.5 2023-12-21 13:56:46,695 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.530e+01 2.784e+01 3.010e+01 3.967e+01, threshold=5.567e+01, percent-clipped=0.0 2023-12-21 13:56:47,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=68013.33333333333, ans=0.0 2023-12-21 13:56:48,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=68013.33333333333, ans=0.125 2023-12-21 13:56:48,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=68013.33333333333, ans=0.125 2023-12-21 13:56:50,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=15.0 2023-12-21 13:57:04,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=12.0 2023-12-21 13:57:14,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=68213.33333333333, ans=0.1 2023-12-21 13:57:15,054 INFO [train.py:886] (3/4) Epoch 3, batch 700, loss[loss=0.01948, audio_tagging_loss=0.01948, over 24750.00 frames. ], tot_loss[loss=0.01821, audio_tagging_loss=0.01821, over 4786932.06 frames. ], batch size: 99, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:57:17,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=68213.33333333333, ans=0.125 2023-12-21 13:57:24,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-12-21 13:57:33,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=68280.0, ans=0.0 2023-12-21 13:57:37,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-12-21 13:57:48,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-21 13:57:49,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-21 13:57:49,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=68413.33333333333, ans=0.125 2023-12-21 13:57:53,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=12.0 2023-12-21 13:58:04,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=68480.0, ans=0.2 2023-12-21 13:58:06,475 INFO [train.py:886] (3/4) Epoch 3, batch 750, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01798, audio_tagging_loss=0.01798, over 4821207.44 frames. ], batch size: 99, lr: 3.22e-02, grad_scale: 64.0 2023-12-21 13:58:07,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=68546.66666666667, ans=0.125 2023-12-21 13:58:14,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68546.66666666667, ans=0.125 2023-12-21 13:58:16,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68613.33333333333, ans=0.1 2023-12-21 13:58:30,490 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.631e+01 2.824e+01 3.219e+01 3.992e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 13:58:33,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=68680.0, ans=0.125 2023-12-21 13:58:36,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=68746.66666666667, ans=0.07 2023-12-21 13:58:51,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68813.33333333333, ans=0.1 2023-12-21 13:58:59,106 INFO [train.py:886] (3/4) Epoch 3, batch 800, loss[loss=0.01775, audio_tagging_loss=0.01775, over 25000.00 frames. ], tot_loss[loss=0.01793, audio_tagging_loss=0.01793, over 4851032.28 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 13:59:08,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=68946.66666666667, ans=0.0 2023-12-21 13:59:40,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2023-12-21 13:59:44,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=69146.66666666667, ans=0.0 2023-12-21 13:59:50,504 INFO [train.py:886] (3/4) Epoch 3, batch 850, loss[loss=0.01817, audio_tagging_loss=0.01817, over 25000.00 frames. ], tot_loss[loss=0.01793, audio_tagging_loss=0.01793, over 4870849.10 frames. ], batch size: 100, lr: 3.21e-02, grad_scale: 64.0 2023-12-21 14:00:00,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2023-12-21 14:00:14,048 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.575e+01 2.735e+01 3.023e+01 4.765e+01, threshold=5.470e+01, percent-clipped=0.0 2023-12-21 14:00:17,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=69346.66666666667, ans=0.125 2023-12-21 14:00:20,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69413.33333333333, ans=0.125 2023-12-21 14:00:32,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-12-21 14:00:38,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-21 14:00:40,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=69480.0, ans=0.125 2023-12-21 14:00:42,363 INFO [train.py:886] (3/4) Epoch 3, batch 900, loss[loss=0.01737, audio_tagging_loss=0.01737, over 24750.00 frames. ], tot_loss[loss=0.01799, audio_tagging_loss=0.01799, over 4882509.58 frames. ], batch size: 99, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:00:43,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.93 vs. limit=22.5 2023-12-21 14:00:43,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69546.66666666667, ans=0.125 2023-12-21 14:00:52,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=69613.33333333333, ans=0.125 2023-12-21 14:01:02,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.29 vs. limit=10.0 2023-12-21 14:01:04,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=69680.0, ans=0.125 2023-12-21 14:01:08,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-12-21 14:01:19,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69746.66666666667, ans=0.125 2023-12-21 14:01:21,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=69746.66666666667, ans=0.125 2023-12-21 14:01:32,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69813.33333333333, ans=0.1 2023-12-21 14:01:35,265 INFO [train.py:886] (3/4) Epoch 3, batch 950, loss[loss=0.02008, audio_tagging_loss=0.02008, over 24750.00 frames. ], tot_loss[loss=0.01809, audio_tagging_loss=0.01809, over 4895138.76 frames. ], batch size: 99, lr: 3.20e-02, grad_scale: 64.0 2023-12-21 14:01:44,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=69946.66666666667, ans=0.125 2023-12-21 14:01:58,496 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.594e+01 2.861e+01 3.061e+01 4.080e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 14:02:14,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=70080.0, ans=0.0 2023-12-21 14:02:25,377 INFO [train.py:886] (3/4) Epoch 3, batch 1000, loss[loss=0.0163, audio_tagging_loss=0.0163, over 25000.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 4909279.23 frames. ], batch size: 100, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:02:56,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-12-21 14:02:57,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-21 14:03:12,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=70480.0, ans=0.0 2023-12-21 14:03:13,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=70480.0, ans=0.0 2023-12-21 14:03:16,855 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.828e+00 2023-12-21 14:03:16,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70480.0, ans=0.1 2023-12-21 14:03:18,556 INFO [train.py:886] (3/4) Epoch 3, batch 1050, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01793, audio_tagging_loss=0.01793, over 4915467.37 frames. ], batch size: 100, lr: 3.19e-02, grad_scale: 64.0 2023-12-21 14:03:23,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.81 vs. limit=15.0 2023-12-21 14:03:43,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=70680.0, ans=0.125 2023-12-21 14:03:43,852 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.533e+01 2.739e+01 3.016e+01 3.717e+01, threshold=5.478e+01, percent-clipped=0.0 2023-12-21 14:03:44,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=70680.0, ans=0.125 2023-12-21 14:03:46,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-12-21 14:03:47,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2023-12-21 14:04:05,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2023-12-21 14:04:11,144 INFO [train.py:886] (3/4) Epoch 3, batch 1100, loss[loss=0.02067, audio_tagging_loss=0.02067, over 25000.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 4927899.28 frames. ], batch size: 100, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:04:15,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=70880.0, ans=0.0 2023-12-21 14:04:32,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-21 14:04:39,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71013.33333333333, ans=0.125 2023-12-21 14:04:52,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.72 vs. limit=15.0 2023-12-21 14:05:01,941 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.385e-01 2023-12-21 14:05:02,650 INFO [train.py:886] (3/4) Epoch 3, batch 1150, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 4934050.01 frames. ], batch size: 100, lr: 3.18e-02, grad_scale: 64.0 2023-12-21 14:05:05,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=71213.33333333333, ans=0.125 2023-12-21 14:05:13,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=71280.0, ans=0.0 2023-12-21 14:05:15,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=71280.0, ans=0.125 2023-12-21 14:05:26,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-12-21 14:05:27,719 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.554e+01 2.817e+01 3.036e+01 4.286e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-21 14:05:38,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71413.33333333333, ans=0.125 2023-12-21 14:05:39,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71413.33333333333, ans=0.1 2023-12-21 14:05:56,103 INFO [train.py:886] (3/4) Epoch 3, batch 1200, loss[loss=0.01776, audio_tagging_loss=0.01776, over 25000.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 4937390.82 frames. ], batch size: 100, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:06:02,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=71546.66666666667, ans=0.0 2023-12-21 14:06:08,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=71613.33333333333, ans=0.0 2023-12-21 14:06:23,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=71680.0, ans=0.2 2023-12-21 14:06:42,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.75 vs. limit=22.5 2023-12-21 14:06:46,189 INFO [train.py:886] (3/4) Epoch 3, batch 1250, loss[loss=0.01793, audio_tagging_loss=0.01793, over 24750.00 frames. ], tot_loss[loss=0.01798, audio_tagging_loss=0.01798, over 4935556.58 frames. ], batch size: 99, lr: 3.17e-02, grad_scale: 64.0 2023-12-21 14:06:56,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=71880.0, ans=0.0 2023-12-21 14:07:05,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=71946.66666666667, ans=0.0 2023-12-21 14:07:10,844 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.406e+01 2.702e+01 2.933e+01 3.398e+01, threshold=5.404e+01, percent-clipped=0.0 2023-12-21 14:07:33,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-12-21 14:07:38,885 INFO [train.py:886] (3/4) Epoch 3, batch 1300, loss[loss=0.01769, audio_tagging_loss=0.01769, over 24750.00 frames. ], tot_loss[loss=0.01806, audio_tagging_loss=0.01806, over 4939231.33 frames. ], batch size: 99, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:07:39,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72213.33333333333, ans=0.125 2023-12-21 14:07:53,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72280.0, ans=0.1 2023-12-21 14:08:02,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=72346.66666666667, ans=0.0 2023-12-21 14:08:02,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=72346.66666666667, ans=0.0 2023-12-21 14:08:20,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=72480.0, ans=0.0 2023-12-21 14:08:31,415 INFO [train.py:886] (3/4) Epoch 3, batch 1350, loss[loss=0.01637, audio_tagging_loss=0.01637, over 24750.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 4941303.11 frames. ], batch size: 99, lr: 3.16e-02, grad_scale: 64.0 2023-12-21 14:08:41,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72613.33333333333, ans=0.125 2023-12-21 14:08:43,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=72613.33333333333, ans=0.0 2023-12-21 14:08:53,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72680.0, ans=0.125 2023-12-21 14:08:54,016 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.555e+01 2.757e+01 3.030e+01 4.227e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 14:08:56,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=72680.0, ans=0.0 2023-12-21 14:09:07,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72746.66666666667, ans=0.1 2023-12-21 14:09:09,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=72746.66666666667, ans=0.125 2023-12-21 14:09:09,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=72746.66666666667, ans=0.0 2023-12-21 14:09:18,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.49 vs. limit=22.5 2023-12-21 14:09:21,845 INFO [train.py:886] (3/4) Epoch 3, batch 1400, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4941779.46 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:09:23,948 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.826e+00 2023-12-21 14:09:32,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=72946.66666666667, ans=0.125 2023-12-21 14:09:37,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=72946.66666666667, ans=0.0 2023-12-21 14:09:55,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=15.0 2023-12-21 14:10:06,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=73146.66666666667, ans=0.0 2023-12-21 14:10:08,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-21 14:10:09,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=73146.66666666667, ans=0.125 2023-12-21 14:10:09,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73146.66666666667, ans=0.1 2023-12-21 14:10:13,692 INFO [train.py:886] (3/4) Epoch 3, batch 1450, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01775, audio_tagging_loss=0.01775, over 4940524.55 frames. ], batch size: 100, lr: 3.15e-02, grad_scale: 64.0 2023-12-21 14:10:15,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=73213.33333333333, ans=0.125 2023-12-21 14:10:16,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=73213.33333333333, ans=0.0 2023-12-21 14:10:17,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.37 vs. limit=10.0 2023-12-21 14:10:26,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=73280.0, ans=0.125 2023-12-21 14:10:30,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=73280.0, ans=0.125 2023-12-21 14:10:35,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=73346.66666666667, ans=0.09899494936611666 2023-12-21 14:10:37,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=73346.66666666667, ans=0.125 2023-12-21 14:10:38,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.554e+01 2.770e+01 3.046e+01 3.677e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 14:10:41,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=15.0 2023-12-21 14:11:05,654 INFO [train.py:886] (3/4) Epoch 3, batch 1500, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01782, audio_tagging_loss=0.01782, over 4945881.15 frames. ], batch size: 100, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:11:34,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=15.0 2023-12-21 14:11:44,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2023-12-21 14:11:58,006 INFO [train.py:886] (3/4) Epoch 3, batch 1550, loss[loss=0.02055, audio_tagging_loss=0.02055, over 24951.00 frames. ], tot_loss[loss=0.01789, audio_tagging_loss=0.01789, over 4948126.04 frames. ], batch size: 100, lr: 3.14e-02, grad_scale: 64.0 2023-12-21 14:12:05,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=73880.0, ans=0.07 2023-12-21 14:12:12,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73946.66666666667, ans=0.125 2023-12-21 14:12:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=73946.66666666667, ans=0.0 2023-12-21 14:12:14,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73946.66666666667, ans=0.1 2023-12-21 14:12:22,094 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.602e+01 2.823e+01 3.207e+01 3.964e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 14:12:49,722 INFO [train.py:886] (3/4) Epoch 3, batch 1600, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01784, audio_tagging_loss=0.01784, over 4943966.93 frames. ], batch size: 100, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:12:51,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=12.0 2023-12-21 14:13:08,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=74346.66666666667, ans=0.0 2023-12-21 14:13:30,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=74480.0, ans=0.125 2023-12-21 14:13:33,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=74480.0, ans=0.125 2023-12-21 14:13:39,706 INFO [train.py:886] (3/4) Epoch 3, batch 1650, loss[loss=0.01701, audio_tagging_loss=0.01701, over 24750.00 frames. ], tot_loss[loss=0.01773, audio_tagging_loss=0.01773, over 4947692.11 frames. ], batch size: 99, lr: 3.13e-02, grad_scale: 64.0 2023-12-21 14:13:51,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-21 14:13:53,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.74 vs. limit=22.5 2023-12-21 14:13:56,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74613.33333333333, ans=0.1 2023-12-21 14:14:03,662 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.550e+01 2.728e+01 3.001e+01 3.650e+01, threshold=5.456e+01, percent-clipped=0.0 2023-12-21 14:14:05,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=74680.0, ans=0.0 2023-12-21 14:14:10,668 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.882e-02 2023-12-21 14:14:23,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=74813.33333333333, ans=0.2 2023-12-21 14:14:30,596 INFO [train.py:886] (3/4) Epoch 3, batch 1700, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4952279.29 frames. ], batch size: 100, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:14:33,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=74880.0, ans=0.09899494936611666 2023-12-21 14:14:35,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=74880.0, ans=0.0 2023-12-21 14:14:42,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-21 14:14:46,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74946.66666666667, ans=0.1 2023-12-21 14:14:50,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=75013.33333333333, ans=0.0 2023-12-21 14:14:57,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=75013.33333333333, ans=0.125 2023-12-21 14:15:00,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=75080.0, ans=0.125 2023-12-21 14:15:05,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=75080.0, ans=0.015 2023-12-21 14:15:21,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-12-21 14:15:22,036 INFO [train.py:886] (3/4) Epoch 3, batch 1750, loss[loss=0.01928, audio_tagging_loss=0.01928, over 25000.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 4953728.05 frames. ], batch size: 100, lr: 3.12e-02, grad_scale: 64.0 2023-12-21 14:15:26,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.44 vs. limit=22.5 2023-12-21 14:15:32,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-21 14:15:35,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=75280.0, ans=0.05 2023-12-21 14:15:44,388 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.774e+01 3.008e+01 3.944e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:15:45,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=75346.66666666667, ans=0.2 2023-12-21 14:15:49,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=75346.66666666667, ans=0.125 2023-12-21 14:16:05,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.39 vs. limit=22.5 2023-12-21 14:16:12,115 INFO [train.py:886] (3/4) Epoch 3, batch 1800, loss[loss=0.01605, audio_tagging_loss=0.01605, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 4954767.98 frames. ], batch size: 100, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:16:24,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.30 vs. limit=15.0 2023-12-21 14:16:26,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=75613.33333333333, ans=0.125 2023-12-21 14:16:29,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 14:16:30,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=75613.33333333333, ans=0.0 2023-12-21 14:16:32,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=75680.0, ans=0.0 2023-12-21 14:16:47,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=75746.66666666667, ans=0.125 2023-12-21 14:17:04,490 INFO [train.py:886] (3/4) Epoch 3, batch 1850, loss[loss=0.02315, audio_tagging_loss=0.02315, over 25000.00 frames. ], tot_loss[loss=0.01779, audio_tagging_loss=0.01779, over 4956797.25 frames. ], batch size: 100, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:17:13,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=75946.66666666667, ans=0.0 2023-12-21 14:17:28,686 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.592e+01 2.758e+01 3.000e+01 3.750e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 14:17:46,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-12-21 14:17:51,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=76146.66666666667, ans=0.0 2023-12-21 14:17:55,780 INFO [train.py:886] (3/4) Epoch 3, batch 1900, loss[loss=0.01825, audio_tagging_loss=0.01825, over 21813.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 4947387.19 frames. ], batch size: 107, lr: 3.11e-02, grad_scale: 64.0 2023-12-21 14:17:55,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=76213.33333333333, ans=0.125 2023-12-21 14:18:17,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=76346.66666666667, ans=0.125 2023-12-21 14:18:33,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=12.0 2023-12-21 14:18:46,714 INFO [train.py:886] (3/4) Epoch 3, batch 1950, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01786, audio_tagging_loss=0.01786, over 4944074.95 frames. ], batch size: 99, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:18:49,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2023-12-21 14:18:50,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.81 vs. limit=22.5 2023-12-21 14:18:56,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=15.0 2023-12-21 14:18:58,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=15.0 2023-12-21 14:19:08,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=76680.0, ans=0.0 2023-12-21 14:19:10,601 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.559e+01 2.784e+01 3.135e+01 5.134e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 14:19:24,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76746.66666666667, ans=0.1 2023-12-21 14:19:32,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=76813.33333333333, ans=0.125 2023-12-21 14:19:37,599 INFO [train.py:886] (3/4) Epoch 3, batch 2000, loss[loss=0.01697, audio_tagging_loss=0.01697, over 25000.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4943545.70 frames. ], batch size: 100, lr: 3.10e-02, grad_scale: 64.0 2023-12-21 14:19:37,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=76880.0, ans=0.0 2023-12-21 14:19:44,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=76880.0, ans=0.0 2023-12-21 14:19:50,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=76946.66666666667, ans=0.2 2023-12-21 14:20:27,160 INFO [train.py:886] (3/4) Epoch 3, batch 2050, loss[loss=0.01682, audio_tagging_loss=0.01682, over 21889.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4947471.76 frames. ], batch size: 107, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:20:42,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-12-21 14:20:51,401 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.488e+01 2.688e+01 3.036e+01 3.855e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 14:20:52,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=77346.66666666667, ans=0.2 2023-12-21 14:20:53,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=77346.66666666667, ans=0.1 2023-12-21 14:20:54,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77346.66666666667, ans=0.1 2023-12-21 14:20:58,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=77413.33333333333, ans=0.0 2023-12-21 14:20:59,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2023-12-21 14:21:19,148 INFO [train.py:886] (3/4) Epoch 3, batch 2100, loss[loss=0.0193, audio_tagging_loss=0.0193, over 24750.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 4954803.31 frames. ], batch size: 99, lr: 3.09e-02, grad_scale: 128.0 2023-12-21 14:21:25,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77546.66666666667, ans=0.125 2023-12-21 14:21:29,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=77613.33333333333, ans=0.125 2023-12-21 14:21:55,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=77746.66666666667, ans=0.5 2023-12-21 14:22:07,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=77880.0, ans=0.2 2023-12-21 14:22:08,609 INFO [train.py:886] (3/4) Epoch 3, batch 2150, loss[loss=0.01758, audio_tagging_loss=0.01758, over 25000.00 frames. ], tot_loss[loss=0.01773, audio_tagging_loss=0.01773, over 4959907.90 frames. ], batch size: 100, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:22:16,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.21 vs. limit=22.5 2023-12-21 14:22:24,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=77946.66666666667, ans=0.5 2023-12-21 14:22:32,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.494e+01 2.667e+01 2.941e+01 3.537e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 14:22:32,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.59 vs. limit=22.5 2023-12-21 14:22:47,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=78080.0, ans=0.125 2023-12-21 14:22:58,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78146.66666666667, ans=0.125 2023-12-21 14:23:01,003 INFO [train.py:886] (3/4) Epoch 3, batch 2200, loss[loss=0.01612, audio_tagging_loss=0.01612, over 24750.00 frames. ], tot_loss[loss=0.01795, audio_tagging_loss=0.01795, over 4957332.29 frames. ], batch size: 99, lr: 3.08e-02, grad_scale: 128.0 2023-12-21 14:23:07,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=78213.33333333333, ans=0.2 2023-12-21 14:23:46,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=78480.0, ans=0.0 2023-12-21 14:23:53,457 INFO [train.py:886] (3/4) Epoch 3, batch 2250, loss[loss=0.01516, audio_tagging_loss=0.01516, over 24750.00 frames. ], tot_loss[loss=0.01796, audio_tagging_loss=0.01796, over 4948805.55 frames. ], batch size: 99, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:23:53,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=78546.66666666667, ans=0.125 2023-12-21 14:23:57,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.85 vs. limit=22.5 2023-12-21 14:24:06,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=78613.33333333333, ans=0.125 2023-12-21 14:24:13,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-21 14:24:16,835 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.575e+01 2.774e+01 3.056e+01 3.945e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 14:24:17,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-12-21 14:24:18,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78680.0, ans=0.1 2023-12-21 14:24:23,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=78746.66666666667, ans=0.125 2023-12-21 14:24:24,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78746.66666666667, ans=0.1 2023-12-21 14:24:28,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=78746.66666666667, ans=0.125 2023-12-21 14:24:35,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=15.0 2023-12-21 14:24:43,245 INFO [train.py:886] (3/4) Epoch 3, batch 2300, loss[loss=0.01697, audio_tagging_loss=0.01697, over 24750.00 frames. ], tot_loss[loss=0.01788, audio_tagging_loss=0.01788, over 4949268.81 frames. ], batch size: 99, lr: 3.07e-02, grad_scale: 128.0 2023-12-21 14:24:59,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=78946.66666666667, ans=0.1 2023-12-21 14:25:00,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=78946.66666666667, ans=0.015 2023-12-21 14:25:04,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=79013.33333333333, ans=0.125 2023-12-21 14:25:23,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-21 14:25:33,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=79146.66666666667, ans=0.0 2023-12-21 14:25:34,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=79213.33333333333, ans=0.125 2023-12-21 14:25:35,585 INFO [train.py:886] (3/4) Epoch 3, batch 2350, loss[loss=0.01655, audio_tagging_loss=0.01655, over 22204.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 4952845.79 frames. ], batch size: 107, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:25:41,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=79213.33333333333, ans=0.0 2023-12-21 14:25:41,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.80 vs. limit=10.0 2023-12-21 14:25:47,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-12-21 14:25:50,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=79280.0, ans=0.125 2023-12-21 14:25:53,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=79280.0, ans=0.0 2023-12-21 14:25:59,421 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.547e+01 2.838e+01 3.114e+01 3.985e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 14:26:12,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=79413.33333333333, ans=0.125 2023-12-21 14:26:26,849 INFO [train.py:886] (3/4) Epoch 3, batch 2400, loss[loss=0.01737, audio_tagging_loss=0.01737, over 25000.00 frames. ], tot_loss[loss=0.01756, audio_tagging_loss=0.01756, over 4958777.63 frames. ], batch size: 100, lr: 3.06e-02, grad_scale: 128.0 2023-12-21 14:26:28,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=79546.66666666667, ans=0.125 2023-12-21 14:26:30,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=29.49 vs. limit=15.0 2023-12-21 14:26:30,810 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.161e+00 2023-12-21 14:26:33,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=79546.66666666667, ans=0.125 2023-12-21 14:26:38,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.42 vs. limit=22.5 2023-12-21 14:26:40,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.02 vs. limit=22.5 2023-12-21 14:27:06,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.06 vs. limit=10.0 2023-12-21 14:27:15,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-12-21 14:27:17,473 INFO [train.py:886] (3/4) Epoch 3, batch 2450, loss[loss=0.01685, audio_tagging_loss=0.01685, over 25000.00 frames. ], tot_loss[loss=0.01761, audio_tagging_loss=0.01761, over 4959937.17 frames. ], batch size: 100, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:27:18,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=79880.0, ans=0.125 2023-12-21 14:27:29,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-12-21 14:27:43,406 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.542e+01 2.747e+01 3.025e+01 4.680e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:28:11,032 INFO [train.py:886] (3/4) Epoch 3, batch 2500, loss[loss=0.01931, audio_tagging_loss=0.01931, over 24750.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4952909.05 frames. ], batch size: 99, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:28:13,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2023-12-21 14:28:29,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.13 vs. limit=22.5 2023-12-21 14:29:01,850 INFO [train.py:886] (3/4) Epoch 3, batch 2550, loss[loss=0.01841, audio_tagging_loss=0.01841, over 24750.00 frames. ], tot_loss[loss=0.01797, audio_tagging_loss=0.01797, over 4948971.47 frames. ], batch size: 99, lr: 3.05e-02, grad_scale: 128.0 2023-12-21 14:29:02,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-12-21 14:29:13,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-21 14:29:21,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-21 14:29:22,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-21 14:29:25,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.659e+01 2.860e+01 3.083e+01 5.383e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-21 14:29:28,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=80680.0, ans=0.0 2023-12-21 14:29:36,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-21 14:29:44,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=80813.33333333333, ans=0.09899494936611666 2023-12-21 14:29:54,314 INFO [train.py:886] (3/4) Epoch 3, batch 2600, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01782, audio_tagging_loss=0.01782, over 4946284.78 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:29:55,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=80880.0, ans=0.5 2023-12-21 14:30:06,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=80946.66666666667, ans=0.125 2023-12-21 14:30:16,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=81013.33333333333, ans=0.125 2023-12-21 14:30:23,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=81013.33333333333, ans=0.125 2023-12-21 14:30:30,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.94 vs. limit=22.5 2023-12-21 14:30:35,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=81146.66666666667, ans=0.0 2023-12-21 14:30:35,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2023-12-21 14:30:38,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=81146.66666666667, ans=0.5 2023-12-21 14:30:46,923 INFO [train.py:886] (3/4) Epoch 3, batch 2650, loss[loss=0.01932, audio_tagging_loss=0.01932, over 25000.00 frames. ], tot_loss[loss=0.0177, audio_tagging_loss=0.0177, over 4946876.45 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 128.0 2023-12-21 14:30:55,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=12.0 2023-12-21 14:31:10,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.677e+01 2.899e+01 3.171e+01 4.559e+01, threshold=5.799e+01, percent-clipped=0.0 2023-12-21 14:31:26,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=81413.33333333333, ans=0.125 2023-12-21 14:31:36,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-21 14:31:38,078 INFO [train.py:886] (3/4) Epoch 3, batch 2700, loss[loss=0.01779, audio_tagging_loss=0.01779, over 25000.00 frames. ], tot_loss[loss=0.01766, audio_tagging_loss=0.01766, over 4946824.42 frames. ], batch size: 100, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:31:40,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.99 vs. limit=22.5 2023-12-21 14:31:58,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=81680.0, ans=0.0 2023-12-21 14:32:04,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=81680.0, ans=0.2 2023-12-21 14:32:09,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2023-12-21 14:32:12,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=81746.66666666667, ans=0.1 2023-12-21 14:32:29,875 INFO [train.py:886] (3/4) Epoch 3, batch 2750, loss[loss=0.02006, audio_tagging_loss=0.02006, over 24750.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4951698.14 frames. ], batch size: 99, lr: 3.03e-02, grad_scale: 128.0 2023-12-21 14:32:45,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-21 14:32:53,651 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.528e+01 2.695e+01 2.960e+01 4.010e+01, threshold=5.390e+01, percent-clipped=0.0 2023-12-21 14:32:53,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=82013.33333333333, ans=0.0 2023-12-21 14:33:01,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=82080.0, ans=0.0 2023-12-21 14:33:20,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=82213.33333333333, ans=0.125 2023-12-21 14:33:20,888 INFO [train.py:886] (3/4) Epoch 3, batch 2800, loss[loss=0.01984, audio_tagging_loss=0.01984, over 24750.00 frames. ], tot_loss[loss=0.01773, audio_tagging_loss=0.01773, over 4954332.63 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:33:36,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.05 vs. limit=15.0 2023-12-21 14:33:43,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-21 14:33:44,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=82346.66666666667, ans=0.0 2023-12-21 14:33:47,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.26 vs. limit=10.0 2023-12-21 14:33:57,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=15.0 2023-12-21 14:34:00,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.27 vs. limit=22.5 2023-12-21 14:34:10,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-21 14:34:12,662 INFO [train.py:886] (3/4) Epoch 3, batch 2850, loss[loss=0.01646, audio_tagging_loss=0.01646, over 24750.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4949506.15 frames. ], batch size: 99, lr: 3.02e-02, grad_scale: 128.0 2023-12-21 14:34:15,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=82546.66666666667, ans=0.125 2023-12-21 14:34:27,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=82613.33333333333, ans=0.0 2023-12-21 14:34:36,512 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.481e+01 2.766e+01 3.054e+01 4.031e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 14:34:37,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.76 vs. limit=15.0 2023-12-21 14:35:04,175 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.824e+00 2023-12-21 14:35:04,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=12.0 2023-12-21 14:35:04,870 INFO [train.py:886] (3/4) Epoch 3, batch 2900, loss[loss=0.01809, audio_tagging_loss=0.01809, over 24750.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 4950627.91 frames. ], batch size: 99, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:35:07,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=15.0 2023-12-21 14:35:11,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=82880.0, ans=0.125 2023-12-21 14:35:17,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2023-12-21 14:35:17,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=82946.66666666667, ans=0.07 2023-12-21 14:35:18,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=82946.66666666667, ans=0.125 2023-12-21 14:35:27,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=83013.33333333333, ans=0.125 2023-12-21 14:35:36,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=15.0 2023-12-21 14:35:45,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=83146.66666666667, ans=0.0 2023-12-21 14:35:47,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-12-21 14:35:51,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83146.66666666667, ans=0.125 2023-12-21 14:35:56,374 INFO [train.py:886] (3/4) Epoch 3, batch 2950, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.01754, audio_tagging_loss=0.01754, over 4946890.45 frames. ], batch size: 100, lr: 3.01e-02, grad_scale: 128.0 2023-12-21 14:36:08,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=83280.0, ans=0.0 2023-12-21 14:36:08,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=83280.0, ans=0.125 2023-12-21 14:36:19,504 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.548e+01 2.774e+01 2.996e+01 3.948e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 14:36:26,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=83413.33333333333, ans=0.125 2023-12-21 14:36:29,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=83413.33333333333, ans=0.0 2023-12-21 14:36:29,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=12.0 2023-12-21 14:36:39,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83480.0, ans=0.1 2023-12-21 14:36:39,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.63 vs. limit=22.5 2023-12-21 14:36:47,852 INFO [train.py:886] (3/4) Epoch 3, batch 3000, loss[loss=0.01944, audio_tagging_loss=0.01944, over 25000.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4946275.64 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:36:47,853 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 14:37:08,995 INFO [train.py:917] (3/4) Epoch 3, validation: loss=0.04203, audio_tagging_loss=0.04203, over 3737520.00 frames. 2023-12-21 14:37:08,995 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 14:37:14,898 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.252e-01 2023-12-21 14:37:17,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=83613.33333333333, ans=0.0 2023-12-21 14:37:22,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=83613.33333333333, ans=0.1 2023-12-21 14:37:40,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-12-21 14:37:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=83813.33333333333, ans=0.1 2023-12-21 14:37:59,629 INFO [train.py:886] (3/4) Epoch 3, batch 3050, loss[loss=0.01895, audio_tagging_loss=0.01895, over 25000.00 frames. ], tot_loss[loss=0.01748, audio_tagging_loss=0.01748, over 4948452.98 frames. ], batch size: 100, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:07,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=83880.0, ans=0.125 2023-12-21 14:38:10,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=83946.66666666667, ans=0.0 2023-12-21 14:38:11,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83946.66666666667, ans=0.125 2023-12-21 14:38:11,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=83946.66666666667, ans=0.125 2023-12-21 14:38:13,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=83946.66666666667, ans=0.0 2023-12-21 14:38:16,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=83946.66666666667, ans=0.125 2023-12-21 14:38:23,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.446e+01 2.688e+01 2.913e+01 3.941e+01, threshold=5.377e+01, percent-clipped=0.0 2023-12-21 14:38:30,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84080.0, ans=0.1 2023-12-21 14:38:34,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=84080.0, ans=0.125 2023-12-21 14:38:48,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=84146.66666666667, ans=0.125 2023-12-21 14:38:51,848 INFO [train.py:886] (3/4) Epoch 3, batch 3100, loss[loss=0.01725, audio_tagging_loss=0.01725, over 24750.00 frames. ], tot_loss[loss=0.01759, audio_tagging_loss=0.01759, over 4952792.84 frames. ], batch size: 99, lr: 3.00e-02, grad_scale: 128.0 2023-12-21 14:38:53,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=84213.33333333333, ans=0.0 2023-12-21 14:39:04,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=84280.0, ans=10.0 2023-12-21 14:39:15,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 14:39:21,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.03 vs. limit=22.5 2023-12-21 14:39:31,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=84480.0, ans=0.125 2023-12-21 14:39:44,552 INFO [train.py:886] (3/4) Epoch 3, batch 3150, loss[loss=0.01746, audio_tagging_loss=0.01746, over 21720.00 frames. ], tot_loss[loss=0.01778, audio_tagging_loss=0.01778, over 4945651.59 frames. ], batch size: 107, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:01,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=84613.33333333333, ans=0.2 2023-12-21 14:40:06,858 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.586e+01 2.797e+01 3.017e+01 3.878e+01, threshold=5.594e+01, percent-clipped=0.0 2023-12-21 14:40:19,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=84746.66666666667, ans=0.125 2023-12-21 14:40:23,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.88 vs. limit=15.0 2023-12-21 14:40:26,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=84813.33333333333, ans=0.0 2023-12-21 14:40:34,475 INFO [train.py:886] (3/4) Epoch 3, batch 3200, loss[loss=0.01776, audio_tagging_loss=0.01776, over 24750.00 frames. ], tot_loss[loss=0.01782, audio_tagging_loss=0.01782, over 4943112.91 frames. ], batch size: 99, lr: 2.99e-02, grad_scale: 128.0 2023-12-21 14:40:52,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=84946.66666666667, ans=0.125 2023-12-21 14:40:57,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85013.33333333333, ans=0.1 2023-12-21 14:41:21,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-12-21 14:41:25,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=85146.66666666667, ans=0.0 2023-12-21 14:41:27,344 INFO [train.py:886] (3/4) Epoch 3, batch 3250, loss[loss=0.02013, audio_tagging_loss=0.02013, over 25000.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 4949559.80 frames. ], batch size: 100, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:41:42,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-12-21 14:41:44,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=85280.0, ans=0.0 2023-12-21 14:41:48,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=85346.66666666667, ans=0.125 2023-12-21 14:41:51,356 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.547e+01 2.711e+01 3.063e+01 4.521e+01, threshold=5.423e+01, percent-clipped=0.0 2023-12-21 14:41:54,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=15.0 2023-12-21 14:42:08,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.72 vs. limit=10.0 2023-12-21 14:42:10,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=85480.0, ans=0.125 2023-12-21 14:42:17,713 INFO [train.py:886] (3/4) Epoch 3, batch 3300, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01765, audio_tagging_loss=0.01765, over 4950155.85 frames. ], batch size: 99, lr: 2.98e-02, grad_scale: 128.0 2023-12-21 14:42:22,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85546.66666666667, ans=0.1 2023-12-21 14:42:28,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-21 14:42:35,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.529e+00 2023-12-21 14:42:37,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=85613.33333333333, ans=0.0 2023-12-21 14:42:44,224 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.299e+00 2023-12-21 14:42:47,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=85680.0, ans=0.125 2023-12-21 14:43:10,642 INFO [train.py:886] (3/4) Epoch 3, batch 3350, loss[loss=0.01948, audio_tagging_loss=0.01948, over 21259.00 frames. ], tot_loss[loss=0.01756, audio_tagging_loss=0.01756, over 4954257.59 frames. ], batch size: 107, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:43:14,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=85880.0, ans=0.0 2023-12-21 14:43:22,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85946.66666666667, ans=0.1 2023-12-21 14:43:26,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=85946.66666666667, ans=0.0 2023-12-21 14:43:34,512 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.545e+01 2.748e+01 2.993e+01 3.826e+01, threshold=5.495e+01, percent-clipped=0.0 2023-12-21 14:43:43,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86080.0, ans=0.0 2023-12-21 14:43:50,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-12-21 14:43:52,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=86146.66666666667, ans=0.0 2023-12-21 14:43:54,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=86146.66666666667, ans=0.2 2023-12-21 14:44:02,986 INFO [train.py:886] (3/4) Epoch 3, batch 3400, loss[loss=0.01849, audio_tagging_loss=0.01849, over 25000.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4955032.67 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:44:07,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86213.33333333333, ans=0.1 2023-12-21 14:44:08,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=86213.33333333333, ans=0.125 2023-12-21 14:44:17,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2023-12-21 14:44:26,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=86346.66666666667, ans=0.0 2023-12-21 14:44:53,125 INFO [train.py:886] (3/4) Epoch 3, batch 3450, loss[loss=0.02053, audio_tagging_loss=0.02053, over 24750.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4952256.61 frames. ], batch size: 99, lr: 2.97e-02, grad_scale: 128.0 2023-12-21 14:45:16,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-21 14:45:18,158 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.604e+01 2.823e+01 3.074e+01 4.024e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-21 14:45:19,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=86680.0, ans=0.125 2023-12-21 14:45:46,444 INFO [train.py:886] (3/4) Epoch 3, batch 3500, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.01785, audio_tagging_loss=0.01785, over 4939283.52 frames. ], batch size: 100, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:45:57,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=86946.66666666667, ans=0.2 2023-12-21 14:46:16,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-12-21 14:46:21,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=87080.0, ans=0.05 2023-12-21 14:46:29,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-12-21 14:46:31,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-21 14:46:38,190 INFO [train.py:886] (3/4) Epoch 3, batch 3550, loss[loss=0.01721, audio_tagging_loss=0.01721, over 24750.00 frames. ], tot_loss[loss=0.01773, audio_tagging_loss=0.01773, over 4937640.31 frames. ], batch size: 99, lr: 2.96e-02, grad_scale: 128.0 2023-12-21 14:46:39,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87213.33333333333, ans=0.1 2023-12-21 14:46:57,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=87346.66666666667, ans=0.07 2023-12-21 14:47:01,418 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.500e+01 2.743e+01 3.014e+01 4.154e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 14:47:07,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=87346.66666666667, ans=0.0 2023-12-21 14:47:29,938 INFO [train.py:886] (3/4) Epoch 3, batch 3600, loss[loss=0.01603, audio_tagging_loss=0.01603, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4944782.82 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:47:30,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=87546.66666666667, ans=0.125 2023-12-21 14:47:42,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=87613.33333333333, ans=0.125 2023-12-21 14:47:52,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87680.0, ans=0.1 2023-12-21 14:48:01,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=87746.66666666667, ans=0.0 2023-12-21 14:48:01,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87746.66666666667, ans=0.0 2023-12-21 14:48:10,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=87813.33333333333, ans=0.125 2023-12-21 14:48:20,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=87813.33333333333, ans=0.0 2023-12-21 14:48:22,025 INFO [train.py:886] (3/4) Epoch 3, batch 3650, loss[loss=0.01753, audio_tagging_loss=0.01753, over 25000.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4949874.73 frames. ], batch size: 100, lr: 2.95e-02, grad_scale: 128.0 2023-12-21 14:48:32,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-12-21 14:48:33,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=87946.66666666667, ans=0.125 2023-12-21 14:48:45,864 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.549e+01 2.738e+01 3.044e+01 4.294e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 14:48:46,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=88013.33333333333, ans=0.0 2023-12-21 14:48:46,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=88013.33333333333, ans=0.125 2023-12-21 14:48:57,364 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.677e-03 2023-12-21 14:49:03,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=88146.66666666667, ans=0.2 2023-12-21 14:49:05,496 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:49:13,700 INFO [train.py:886] (3/4) Epoch 3, batch 3700, loss[loss=0.01744, audio_tagging_loss=0.01744, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4952179.46 frames. ], batch size: 100, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:49:15,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=88213.33333333333, ans=0.125 2023-12-21 14:49:28,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-12-21 14:49:32,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2023-12-21 14:49:45,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=88413.33333333333, ans=0.125 2023-12-21 14:49:58,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=88480.0, ans=0.125 2023-12-21 14:50:03,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=88480.0, ans=0.0 2023-12-21 14:50:05,630 INFO [train.py:886] (3/4) Epoch 3, batch 3750, loss[loss=0.01716, audio_tagging_loss=0.01716, over 24750.00 frames. ], tot_loss[loss=0.01765, audio_tagging_loss=0.01765, over 4947329.56 frames. ], batch size: 99, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:50:05,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88546.66666666667, ans=0.125 2023-12-21 14:50:10,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=88546.66666666667, ans=0.125 2023-12-21 14:50:13,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=88546.66666666667, ans=0.2 2023-12-21 14:50:15,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=88613.33333333333, ans=0.0 2023-12-21 14:50:16,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-12-21 14:50:19,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88613.33333333333, ans=0.1 2023-12-21 14:50:20,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=88613.33333333333, ans=0.125 2023-12-21 14:50:21,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88613.33333333333, ans=0.1 2023-12-21 14:50:21,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2023-12-21 14:50:26,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 14:50:29,284 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.571e+01 2.734e+01 2.974e+01 3.491e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 14:50:37,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=88746.66666666667, ans=0.0 2023-12-21 14:50:39,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=88746.66666666667, ans=0.125 2023-12-21 14:50:57,902 INFO [train.py:886] (3/4) Epoch 3, batch 3800, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4943636.36 frames. ], batch size: 100, lr: 2.94e-02, grad_scale: 128.0 2023-12-21 14:51:08,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88946.66666666667, ans=0.125 2023-12-21 14:51:17,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=89013.33333333333, ans=0.125 2023-12-21 14:51:23,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=15.0 2023-12-21 14:51:30,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=89080.0, ans=0.0 2023-12-21 14:51:46,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=89146.66666666667, ans=0.125 2023-12-21 14:51:49,279 INFO [train.py:886] (3/4) Epoch 3, batch 3850, loss[loss=0.02163, audio_tagging_loss=0.02163, over 24750.00 frames. ], tot_loss[loss=0.01749, audio_tagging_loss=0.01749, over 4941295.04 frames. ], batch size: 99, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:51:53,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=89213.33333333333, ans=0.09899494936611666 2023-12-21 14:52:05,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=89280.0, ans=0.125 2023-12-21 14:52:06,142 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.340e-01 2023-12-21 14:52:09,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-21 14:52:12,665 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.537e+01 2.733e+01 2.945e+01 4.366e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 14:52:24,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.83 vs. limit=22.5 2023-12-21 14:52:26,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=89413.33333333333, ans=0.05 2023-12-21 14:52:28,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=89413.33333333333, ans=0.2 2023-12-21 14:52:40,927 INFO [train.py:886] (3/4) Epoch 3, batch 3900, loss[loss=0.01765, audio_tagging_loss=0.01765, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4949477.70 frames. ], batch size: 100, lr: 2.93e-02, grad_scale: 128.0 2023-12-21 14:52:43,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2023-12-21 14:52:52,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89613.33333333333, ans=0.125 2023-12-21 14:52:54,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=89613.33333333333, ans=10.0 2023-12-21 14:52:56,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=89613.33333333333, ans=0.0 2023-12-21 14:53:14,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=89746.66666666667, ans=0.07 2023-12-21 14:53:25,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89813.33333333333, ans=0.1 2023-12-21 14:53:33,204 INFO [train.py:886] (3/4) Epoch 3, batch 3950, loss[loss=0.02039, audio_tagging_loss=0.02039, over 25000.00 frames. ], tot_loss[loss=0.01751, audio_tagging_loss=0.01751, over 4946360.01 frames. ], batch size: 100, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:53:34,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=89880.0, ans=0.2 2023-12-21 14:53:39,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-21 14:53:49,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.62 vs. limit=15.0 2023-12-21 14:53:50,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=15.0 2023-12-21 14:53:57,087 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.521e+01 2.746e+01 2.975e+01 3.800e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 14:54:24,064 INFO [train.py:886] (3/4) Epoch 3, batch 4000, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4949715.08 frames. ], batch size: 100, lr: 2.92e-02, grad_scale: 128.0 2023-12-21 14:54:29,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90213.33333333333, ans=0.125 2023-12-21 14:54:30,633 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.537e+01 2023-12-21 14:54:53,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90413.33333333333, ans=0.125 2023-12-21 14:55:07,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=90480.0, ans=0.125 2023-12-21 14:55:16,295 INFO [train.py:886] (3/4) Epoch 3, batch 4050, loss[loss=0.02119, audio_tagging_loss=0.02119, over 24750.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 4952066.82 frames. ], batch size: 99, lr: 2.92e-02, grad_scale: 256.0 2023-12-21 14:55:33,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=90613.33333333333, ans=0.2 2023-12-21 14:55:33,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90613.33333333333, ans=0.125 2023-12-21 14:55:37,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90680.0, ans=0.125 2023-12-21 14:55:38,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=90680.0, ans=0.0 2023-12-21 14:55:39,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.624e+01 2.857e+01 3.103e+01 4.116e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-21 14:55:41,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=90680.0, ans=0.2 2023-12-21 14:55:48,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=90746.66666666667, ans=0.0 2023-12-21 14:55:51,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90746.66666666667, ans=0.1 2023-12-21 14:55:54,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-12-21 14:56:08,205 INFO [train.py:886] (3/4) Epoch 3, batch 4100, loss[loss=0.01722, audio_tagging_loss=0.01722, over 24750.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 4947359.76 frames. ], batch size: 99, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:56:10,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2023-12-21 14:56:14,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90880.0, ans=0.1 2023-12-21 14:56:14,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=15.0 2023-12-21 14:56:17,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90880.0, ans=0.1 2023-12-21 14:56:21,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90946.66666666667, ans=0.1 2023-12-21 14:56:25,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=90946.66666666667, ans=0.125 2023-12-21 14:56:39,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91080.0, ans=0.1 2023-12-21 14:56:42,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-12-21 14:56:59,600 INFO [train.py:886] (3/4) Epoch 3, batch 4150, loss[loss=0.01857, audio_tagging_loss=0.01857, over 24750.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4948347.53 frames. ], batch size: 99, lr: 2.91e-02, grad_scale: 256.0 2023-12-21 14:57:00,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-12-21 14:57:24,206 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.605e+01 2.901e+01 3.180e+01 4.178e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-21 14:57:40,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91480.0, ans=0.1 2023-12-21 14:57:52,738 INFO [train.py:886] (3/4) Epoch 3, batch 4200, loss[loss=0.01774, audio_tagging_loss=0.01774, over 25000.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4951939.99 frames. ], batch size: 100, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:57:53,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91546.66666666667, ans=0.125 2023-12-21 14:58:02,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=91613.33333333333, ans=0.1 2023-12-21 14:58:08,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=91613.33333333333, ans=0.0 2023-12-21 14:58:14,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.95 vs. limit=22.5 2023-12-21 14:58:25,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-12-21 14:58:25,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-12-21 14:58:29,512 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 14:58:31,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=91746.66666666667, ans=0.2 2023-12-21 14:58:34,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.29 vs. limit=22.5 2023-12-21 14:58:36,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=91813.33333333333, ans=0.07 2023-12-21 14:58:42,614 INFO [train.py:886] (3/4) Epoch 3, batch 4250, loss[loss=0.01851, audio_tagging_loss=0.01851, over 25000.00 frames. ], tot_loss[loss=0.01741, audio_tagging_loss=0.01741, over 4955604.84 frames. ], batch size: 100, lr: 2.90e-02, grad_scale: 256.0 2023-12-21 14:58:49,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-12-21 14:59:07,475 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.544e+01 2.694e+01 3.014e+01 4.277e+01, threshold=5.388e+01, percent-clipped=0.0 2023-12-21 14:59:12,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=92013.33333333333, ans=0.0 2023-12-21 14:59:19,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92080.0, ans=0.2 2023-12-21 14:59:25,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2023-12-21 14:59:35,998 INFO [train.py:886] (3/4) Epoch 3, batch 4300, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 4954727.23 frames. ], batch size: 99, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 14:59:41,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-12-21 14:59:46,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.46 vs. limit=22.5 2023-12-21 14:59:52,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=92280.0, ans=0.0 2023-12-21 14:59:56,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=92346.66666666667, ans=0.125 2023-12-21 15:00:26,775 INFO [train.py:886] (3/4) Epoch 3, batch 4350, loss[loss=0.01956, audio_tagging_loss=0.01956, over 24750.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4958882.36 frames. ], batch size: 99, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:00:27,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=92546.66666666667, ans=0.125 2023-12-21 15:00:30,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=92546.66666666667, ans=0.0 2023-12-21 15:00:34,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92546.66666666667, ans=0.125 2023-12-21 15:00:36,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=92613.33333333333, ans=0.125 2023-12-21 15:00:43,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2023-12-21 15:00:50,895 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.529e+01 2.731e+01 2.913e+01 4.342e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 15:01:06,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92746.66666666667, ans=0.1 2023-12-21 15:01:17,666 INFO [train.py:886] (3/4) Epoch 3, batch 4400, loss[loss=0.01863, audio_tagging_loss=0.01863, over 24750.00 frames. ], tot_loss[loss=0.01775, audio_tagging_loss=0.01775, over 4959263.01 frames. ], batch size: 99, lr: 2.89e-02, grad_scale: 128.0 2023-12-21 15:01:29,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=92946.66666666667, ans=0.125 2023-12-21 15:01:58,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.88 vs. limit=22.5 2023-12-21 15:02:10,666 INFO [train.py:886] (3/4) Epoch 3, batch 4450, loss[loss=0.01697, audio_tagging_loss=0.01697, over 25000.00 frames. ], tot_loss[loss=0.01766, audio_tagging_loss=0.01766, over 4955447.59 frames. ], batch size: 100, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:02:13,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-21 15:02:20,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=93280.0, ans=0.0 2023-12-21 15:02:26,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93280.0, ans=0.125 2023-12-21 15:02:26,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93280.0, ans=0.1 2023-12-21 15:02:34,335 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.637e+01 2.853e+01 3.131e+01 4.120e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-21 15:02:44,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-12-21 15:03:02,128 INFO [train.py:886] (3/4) Epoch 3, batch 4500, loss[loss=0.01823, audio_tagging_loss=0.01823, over 25000.00 frames. ], tot_loss[loss=0.01763, audio_tagging_loss=0.01763, over 4957179.48 frames. ], batch size: 100, lr: 2.88e-02, grad_scale: 128.0 2023-12-21 15:03:19,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-12-21 15:03:21,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=93613.33333333333, ans=0.0 2023-12-21 15:03:30,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=93680.0, ans=0.02 2023-12-21 15:03:54,136 INFO [train.py:886] (3/4) Epoch 3, batch 4550, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01747, audio_tagging_loss=0.01747, over 4957123.90 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:03:58,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=93880.0, ans=0.0 2023-12-21 15:04:02,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=93946.66666666667, ans=0.125 2023-12-21 15:04:12,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93946.66666666667, ans=0.1 2023-12-21 15:04:18,733 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+01 2.554e+01 2.788e+01 2.993e+01 3.924e+01, threshold=5.575e+01, percent-clipped=0.0 2023-12-21 15:04:45,096 INFO [train.py:886] (3/4) Epoch 3, batch 4600, loss[loss=0.01738, audio_tagging_loss=0.01738, over 25000.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4959967.51 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:04:46,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=94213.33333333333, ans=0.125 2023-12-21 15:05:01,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2023-12-21 15:05:14,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=94346.66666666667, ans=0.1 2023-12-21 15:05:30,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 15:05:37,599 INFO [train.py:886] (3/4) Epoch 3, batch 4650, loss[loss=0.01926, audio_tagging_loss=0.01926, over 25000.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 4963760.33 frames. ], batch size: 100, lr: 2.87e-02, grad_scale: 128.0 2023-12-21 15:06:03,479 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.559e+01 2.807e+01 3.071e+01 3.874e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 15:06:04,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=94680.0, ans=0.125 2023-12-21 15:06:28,127 INFO [train.py:886] (3/4) Epoch 3, batch 4700, loss[loss=0.02006, audio_tagging_loss=0.02006, over 24750.00 frames. ], tot_loss[loss=0.01765, audio_tagging_loss=0.01765, over 4955817.66 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:06:39,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=94946.66666666667, ans=15.0 2023-12-21 15:06:43,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94946.66666666667, ans=0.125 2023-12-21 15:06:57,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.91 vs. limit=22.5 2023-12-21 15:06:58,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95080.0, ans=0.1 2023-12-21 15:07:01,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=95080.0, ans=0.2 2023-12-21 15:07:05,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=95146.66666666667, ans=0.125 2023-12-21 15:07:12,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=95146.66666666667, ans=0.0 2023-12-21 15:07:14,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=95213.33333333333, ans=0.2 2023-12-21 15:07:15,305 INFO [train.py:886] (3/4) Epoch 3, batch 4750, loss[loss=0.01779, audio_tagging_loss=0.01779, over 24750.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 4945034.89 frames. ], batch size: 99, lr: 2.86e-02, grad_scale: 128.0 2023-12-21 15:07:23,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=95213.33333333333, ans=0.125 2023-12-21 15:07:52,987 INFO [train.py:886] (3/4) Epoch 4, batch 0, loss[loss=0.05348, audio_tagging_loss=0.05348, over 19889.00 frames. ], tot_loss[loss=0.05348, audio_tagging_loss=0.05348, over 19889.00 frames. ], batch size: 107, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:07:52,988 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 15:08:16,364 INFO [train.py:917] (3/4) Epoch 4, validation: loss=0.03936, audio_tagging_loss=0.03936, over 3737520.00 frames. 2023-12-21 15:08:16,365 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 15:08:25,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.624e+01 2.822e+01 3.250e+01 1.153e+02, threshold=5.643e+01, percent-clipped=3.0 2023-12-21 15:08:34,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.44 vs. limit=15.0 2023-12-21 15:08:35,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=95453.33333333333, ans=0.2 2023-12-21 15:08:40,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95453.33333333333, ans=0.125 2023-12-21 15:08:43,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=95453.33333333333, ans=0.0 2023-12-21 15:08:55,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=95520.0, ans=0.125 2023-12-21 15:09:05,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=95586.66666666667, ans=0.125 2023-12-21 15:09:08,005 INFO [train.py:886] (3/4) Epoch 4, batch 50, loss[loss=0.02287, audio_tagging_loss=0.02287, over 25000.00 frames. ], tot_loss[loss=0.02773, audio_tagging_loss=0.02773, over 1115602.73 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:09:20,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2023-12-21 15:09:26,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.09 vs. limit=22.5 2023-12-21 15:09:30,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2023-12-21 15:09:38,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=95853.33333333333, ans=0.125 2023-12-21 15:09:42,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=95853.33333333333, ans=0.125 2023-12-21 15:09:43,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2023-12-21 15:10:00,220 INFO [train.py:886] (3/4) Epoch 4, batch 100, loss[loss=0.0179, audio_tagging_loss=0.0179, over 25000.00 frames. ], tot_loss[loss=0.02367, audio_tagging_loss=0.02367, over 1966735.42 frames. ], batch size: 100, lr: 2.67e-02, grad_scale: 128.0 2023-12-21 15:10:02,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=95986.66666666667, ans=0.125 2023-12-21 15:10:05,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=95986.66666666667, ans=0.125 2023-12-21 15:10:07,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=95986.66666666667, ans=0.0 2023-12-21 15:10:08,521 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.883e+01 3.182e+01 3.510e+01 4.274e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-21 15:10:11,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=96053.33333333333, ans=0.125 2023-12-21 15:10:15,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=96053.33333333333, ans=0.0 2023-12-21 15:10:19,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=96120.0, ans=0.0 2023-12-21 15:10:25,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=96120.0, ans=0.05 2023-12-21 15:10:29,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=96120.0, ans=0.125 2023-12-21 15:10:36,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-12-21 15:10:46,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-12-21 15:10:51,517 INFO [train.py:886] (3/4) Epoch 4, batch 150, loss[loss=0.01863, audio_tagging_loss=0.01863, over 24750.00 frames. ], tot_loss[loss=0.02168, audio_tagging_loss=0.02168, over 2631897.68 frames. ], batch size: 99, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:10:56,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=96320.0, ans=0.125 2023-12-21 15:11:12,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96453.33333333333, ans=0.1 2023-12-21 15:11:20,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=96453.33333333333, ans=0.125 2023-12-21 15:11:22,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-12-21 15:11:22,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=96520.0, ans=10.0 2023-12-21 15:11:34,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=96586.66666666667, ans=0.125 2023-12-21 15:11:38,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=96586.66666666667, ans=0.125 2023-12-21 15:11:44,280 INFO [train.py:886] (3/4) Epoch 4, batch 200, loss[loss=0.01842, audio_tagging_loss=0.01842, over 24750.00 frames. ], tot_loss[loss=0.02031, audio_tagging_loss=0.02031, over 3146601.49 frames. ], batch size: 99, lr: 2.66e-02, grad_scale: 128.0 2023-12-21 15:11:51,855 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.596e+01 2.833e+01 2.992e+01 3.762e+01, threshold=5.666e+01, percent-clipped=0.0 2023-12-21 15:11:52,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96653.33333333333, ans=0.1 2023-12-21 15:12:04,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=96786.66666666667, ans=0.035 2023-12-21 15:12:06,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=96786.66666666667, ans=0.2 2023-12-21 15:12:14,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=96853.33333333333, ans=0.125 2023-12-21 15:12:18,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2023-12-21 15:12:22,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96853.33333333333, ans=0.1 2023-12-21 15:12:28,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=96920.0, ans=0.0 2023-12-21 15:12:35,172 INFO [train.py:886] (3/4) Epoch 4, batch 250, loss[loss=0.01687, audio_tagging_loss=0.01687, over 25000.00 frames. ], tot_loss[loss=0.01941, audio_tagging_loss=0.01941, over 3550184.14 frames. ], batch size: 100, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:12:41,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-12-21 15:12:53,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=97053.33333333333, ans=0.0 2023-12-21 15:12:54,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=97053.33333333333, ans=0.125 2023-12-21 15:13:08,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=97186.66666666667, ans=0.2 2023-12-21 15:13:23,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=97253.33333333333, ans=0.125 2023-12-21 15:13:26,984 INFO [train.py:886] (3/4) Epoch 4, batch 300, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.01897, audio_tagging_loss=0.01897, over 3857739.56 frames. ], batch size: 99, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:13:29,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=97320.0, ans=0.0 2023-12-21 15:13:31,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=97320.0, ans=0.2 2023-12-21 15:13:34,777 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.584e+01 2.801e+01 3.020e+01 3.817e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-21 15:13:59,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=97520.0, ans=0.0 2023-12-21 15:14:08,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97586.66666666667, ans=0.1 2023-12-21 15:14:19,494 INFO [train.py:886] (3/4) Epoch 4, batch 350, loss[loss=0.01919, audio_tagging_loss=0.01919, over 24750.00 frames. ], tot_loss[loss=0.01865, audio_tagging_loss=0.01865, over 4098234.85 frames. ], batch size: 99, lr: 2.65e-02, grad_scale: 128.0 2023-12-21 15:14:21,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=97653.33333333333, ans=0.0 2023-12-21 15:14:29,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97720.0, ans=0.0 2023-12-21 15:14:29,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=97720.0, ans=0.0 2023-12-21 15:14:30,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2023-12-21 15:14:55,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-12-21 15:15:09,504 INFO [train.py:886] (3/4) Epoch 4, batch 400, loss[loss=0.01872, audio_tagging_loss=0.01872, over 25000.00 frames. ], tot_loss[loss=0.01825, audio_tagging_loss=0.01825, over 4285693.56 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:15:18,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.499e+01 2.678e+01 2.862e+01 4.047e+01, threshold=5.355e+01, percent-clipped=0.0 2023-12-21 15:15:53,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2023-12-21 15:15:57,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.25 vs. limit=6.0 2023-12-21 15:16:00,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=98253.33333333333, ans=0.125 2023-12-21 15:16:01,873 INFO [train.py:886] (3/4) Epoch 4, batch 450, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24017.00 frames. ], tot_loss[loss=0.01795, audio_tagging_loss=0.01795, over 4435486.38 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:17,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.20 vs. limit=6.0 2023-12-21 15:16:25,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=98453.33333333333, ans=0.125 2023-12-21 15:16:27,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=98453.33333333333, ans=0.2 2023-12-21 15:16:40,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98520.0, ans=0.125 2023-12-21 15:16:42,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=98586.66666666667, ans=0.2 2023-12-21 15:16:51,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98653.33333333333, ans=0.125 2023-12-21 15:16:52,052 INFO [train.py:886] (3/4) Epoch 4, batch 500, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 4547669.67 frames. ], batch size: 100, lr: 2.64e-02, grad_scale: 128.0 2023-12-21 15:16:55,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=98653.33333333333, ans=0.0 2023-12-21 15:16:56,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=98653.33333333333, ans=0.0 2023-12-21 15:17:02,592 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.476e+01 2.666e+01 2.895e+01 4.028e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 15:17:10,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-21 15:17:29,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=98853.33333333333, ans=0.04949747468305833 2023-12-21 15:17:41,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=98920.0, ans=0.125 2023-12-21 15:17:44,840 INFO [train.py:886] (3/4) Epoch 4, batch 550, loss[loss=0.01604, audio_tagging_loss=0.01604, over 25000.00 frames. ], tot_loss[loss=0.0177, audio_tagging_loss=0.0177, over 4630432.12 frames. ], batch size: 100, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:18:10,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=99120.0, ans=15.0 2023-12-21 15:18:10,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=99120.0, ans=0.125 2023-12-21 15:18:12,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=99120.0, ans=0.125 2023-12-21 15:18:13,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=99120.0, ans=0.07 2023-12-21 15:18:24,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99253.33333333333, ans=0.1 2023-12-21 15:18:37,711 INFO [train.py:886] (3/4) Epoch 4, batch 600, loss[loss=0.01643, audio_tagging_loss=0.01643, over 24750.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 4702669.95 frames. ], batch size: 99, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:18:45,285 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.527e+01 2.841e+01 3.010e+01 4.382e+01, threshold=5.682e+01, percent-clipped=0.0 2023-12-21 15:18:51,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=99386.66666666667, ans=0.0 2023-12-21 15:18:58,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=99453.33333333333, ans=0.125 2023-12-21 15:19:04,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99453.33333333333, ans=0.1 2023-12-21 15:19:08,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=99520.0, ans=10.0 2023-12-21 15:19:27,909 INFO [train.py:886] (3/4) Epoch 4, batch 650, loss[loss=0.01538, audio_tagging_loss=0.01538, over 24750.00 frames. ], tot_loss[loss=0.01781, audio_tagging_loss=0.01781, over 4751117.55 frames. ], batch size: 99, lr: 2.63e-02, grad_scale: 128.0 2023-12-21 15:19:33,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=99653.33333333333, ans=0.0 2023-12-21 15:19:34,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=99653.33333333333, ans=0.125 2023-12-21 15:19:36,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=99653.33333333333, ans=0.0 2023-12-21 15:19:42,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.23 vs. limit=22.5 2023-12-21 15:19:59,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-12-21 15:20:00,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=99853.33333333333, ans=0.0 2023-12-21 15:20:19,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-21 15:20:19,903 INFO [train.py:886] (3/4) Epoch 4, batch 700, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01769, audio_tagging_loss=0.01769, over 4794136.24 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:20:27,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.543e+01 2.759e+01 3.003e+01 3.794e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 15:20:27,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=99986.66666666667, ans=12.0 2023-12-21 15:20:29,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=100053.33333333333, ans=0.125 2023-12-21 15:20:37,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-21 15:20:41,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=100120.0, ans=0.04949747468305833 2023-12-21 15:20:49,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=15.0 2023-12-21 15:20:52,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2023-12-21 15:20:53,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=100186.66666666667, ans=0.125 2023-12-21 15:20:55,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=100186.66666666667, ans=0.125 2023-12-21 15:20:56,233 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.124e-01 2023-12-21 15:21:12,372 INFO [train.py:886] (3/4) Epoch 4, batch 750, loss[loss=0.01675, audio_tagging_loss=0.01675, over 25000.00 frames. ], tot_loss[loss=0.0176, audio_tagging_loss=0.0176, over 4831121.64 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:21:13,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=15.0 2023-12-21 15:21:26,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=100386.66666666667, ans=0.2 2023-12-21 15:21:32,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=100453.33333333333, ans=10.0 2023-12-21 15:21:51,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=100520.0, ans=0.125 2023-12-21 15:22:03,767 INFO [train.py:886] (3/4) Epoch 4, batch 800, loss[loss=0.01789, audio_tagging_loss=0.01789, over 25000.00 frames. ], tot_loss[loss=0.01751, audio_tagging_loss=0.01751, over 4861518.55 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 128.0 2023-12-21 15:22:11,434 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.426e+01 2.587e+01 2.849e+01 3.873e+01, threshold=5.173e+01, percent-clipped=0.0 2023-12-21 15:22:12,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=100720.0, ans=0.0 2023-12-21 15:22:15,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=100720.0, ans=0.125 2023-12-21 15:22:18,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.45 vs. limit=15.0 2023-12-21 15:22:54,915 INFO [train.py:886] (3/4) Epoch 4, batch 850, loss[loss=0.01845, audio_tagging_loss=0.01845, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 4888736.33 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:22:58,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=100986.66666666667, ans=0.125 2023-12-21 15:23:07,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-12-21 15:23:08,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=101053.33333333333, ans=0.0 2023-12-21 15:23:20,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101120.0, ans=0.1 2023-12-21 15:23:22,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=101120.0, ans=0.0 2023-12-21 15:23:29,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.93 vs. limit=15.0 2023-12-21 15:23:34,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-12-21 15:23:39,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=101253.33333333333, ans=0.0 2023-12-21 15:23:44,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=101320.0, ans=0.125 2023-12-21 15:23:45,073 INFO [train.py:886] (3/4) Epoch 4, batch 900, loss[loss=0.01775, audio_tagging_loss=0.01775, over 21832.00 frames. ], tot_loss[loss=0.01743, audio_tagging_loss=0.01743, over 4903227.76 frames. ], batch size: 107, lr: 2.61e-02, grad_scale: 128.0 2023-12-21 15:23:45,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=101320.0, ans=0.09899494936611666 2023-12-21 15:23:54,876 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.627e+01 2.825e+01 3.078e+01 4.421e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-21 15:23:56,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=101386.66666666667, ans=0.0 2023-12-21 15:23:59,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=101386.66666666667, ans=0.0 2023-12-21 15:24:08,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=101453.33333333333, ans=0.0 2023-12-21 15:24:11,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101453.33333333333, ans=0.125 2023-12-21 15:24:33,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=101586.66666666667, ans=0.0 2023-12-21 15:24:37,284 INFO [train.py:886] (3/4) Epoch 4, batch 950, loss[loss=0.01627, audio_tagging_loss=0.01627, over 24750.00 frames. ], tot_loss[loss=0.01742, audio_tagging_loss=0.01742, over 4906598.17 frames. ], batch size: 99, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:24:49,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-21 15:24:50,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2023-12-21 15:24:50,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=12.0 2023-12-21 15:24:57,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2023-12-21 15:25:06,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-21 15:25:29,005 INFO [train.py:886] (3/4) Epoch 4, batch 1000, loss[loss=0.02166, audio_tagging_loss=0.02166, over 22144.00 frames. ], tot_loss[loss=0.01735, audio_tagging_loss=0.01735, over 4908645.68 frames. ], batch size: 107, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:25:36,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.499e+01 2.686e+01 2.949e+01 3.703e+01, threshold=5.372e+01, percent-clipped=0.0 2023-12-21 15:25:41,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=102053.33333333333, ans=0.015 2023-12-21 15:26:08,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=102253.33333333333, ans=0.5 2023-12-21 15:26:09,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.02 vs. limit=22.5 2023-12-21 15:26:19,860 INFO [train.py:886] (3/4) Epoch 4, batch 1050, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4922719.76 frames. ], batch size: 100, lr: 2.60e-02, grad_scale: 128.0 2023-12-21 15:26:37,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=15.0 2023-12-21 15:26:56,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=102520.0, ans=0.0 2023-12-21 15:27:08,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=102586.66666666667, ans=0.125 2023-12-21 15:27:09,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-21 15:27:11,321 INFO [train.py:886] (3/4) Epoch 4, batch 1100, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4930001.80 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:27:19,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.582e+01 2.804e+01 3.074e+01 3.939e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 15:27:20,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=102653.33333333333, ans=0.125 2023-12-21 15:27:22,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=102720.0, ans=0.0 2023-12-21 15:27:49,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=102853.33333333333, ans=0.125 2023-12-21 15:28:02,456 INFO [train.py:886] (3/4) Epoch 4, batch 1150, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01708, audio_tagging_loss=0.01708, over 4939067.22 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:28:06,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=102986.66666666667, ans=0.0 2023-12-21 15:28:11,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-12-21 15:28:13,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=103053.33333333333, ans=0.0 2023-12-21 15:28:20,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=103053.33333333333, ans=0.0 2023-12-21 15:28:25,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=103120.0, ans=0.125 2023-12-21 15:28:53,961 INFO [train.py:886] (3/4) Epoch 4, batch 1200, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.01723, audio_tagging_loss=0.01723, over 4946137.72 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 128.0 2023-12-21 15:28:54,215 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.949e+01 2023-12-21 15:28:57,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-21 15:29:02,242 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.608e+01 2.769e+01 2.975e+01 3.396e+01, threshold=5.537e+01, percent-clipped=0.0 2023-12-21 15:29:03,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103386.66666666667, ans=0.1 2023-12-21 15:29:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=103453.33333333333, ans=15.0 2023-12-21 15:29:41,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.95 vs. limit=22.5 2023-12-21 15:29:46,012 INFO [train.py:886] (3/4) Epoch 4, batch 1250, loss[loss=0.01904, audio_tagging_loss=0.01904, over 24750.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4947622.20 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:29:56,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103720.0, ans=0.1 2023-12-21 15:30:06,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=103786.66666666667, ans=0.125 2023-12-21 15:30:16,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=103853.33333333333, ans=0.2 2023-12-21 15:30:16,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103853.33333333333, ans=0.1 2023-12-21 15:30:28,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=15.0 2023-12-21 15:30:28,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103920.0, ans=0.0 2023-12-21 15:30:35,962 INFO [train.py:886] (3/4) Epoch 4, batch 1300, loss[loss=0.01829, audio_tagging_loss=0.01829, over 24750.00 frames. ], tot_loss[loss=0.01744, audio_tagging_loss=0.01744, over 4945294.62 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:30:44,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=103986.66666666667, ans=0.125 2023-12-21 15:30:45,066 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.570e+01 2.794e+01 3.000e+01 3.597e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 15:30:58,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=104120.0, ans=0.0 2023-12-21 15:31:03,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.94 vs. limit=10.0 2023-12-21 15:31:21,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=104253.33333333333, ans=0.125 2023-12-21 15:31:25,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-21 15:31:27,401 INFO [train.py:886] (3/4) Epoch 4, batch 1350, loss[loss=0.01629, audio_tagging_loss=0.01629, over 24750.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 4947031.65 frames. ], batch size: 99, lr: 2.58e-02, grad_scale: 128.0 2023-12-21 15:31:47,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104453.33333333333, ans=0.1 2023-12-21 15:31:51,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=104453.33333333333, ans=0.125 2023-12-21 15:31:52,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-12-21 15:31:52,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2023-12-21 15:32:03,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-12-21 15:32:08,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-12-21 15:32:08,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=104586.66666666667, ans=0.0 2023-12-21 15:32:10,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=104586.66666666667, ans=0.2 2023-12-21 15:32:18,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=104653.33333333333, ans=0.125 2023-12-21 15:32:19,153 INFO [train.py:886] (3/4) Epoch 4, batch 1400, loss[loss=0.0166, audio_tagging_loss=0.0166, over 25000.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 4942787.15 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:32:22,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=104653.33333333333, ans=0.125 2023-12-21 15:32:26,651 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.479e+01 2.726e+01 3.027e+01 3.760e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 15:32:28,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=104720.0, ans=0.125 2023-12-21 15:32:54,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-12-21 15:33:02,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=104920.0, ans=0.125 2023-12-21 15:33:02,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.10 vs. limit=12.0 2023-12-21 15:33:08,512 INFO [train.py:886] (3/4) Epoch 4, batch 1450, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 4946691.32 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:33:51,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.82 vs. limit=10.0 2023-12-21 15:34:00,909 INFO [train.py:886] (3/4) Epoch 4, batch 1500, loss[loss=0.01714, audio_tagging_loss=0.01714, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4953615.29 frames. ], batch size: 100, lr: 2.57e-02, grad_scale: 128.0 2023-12-21 15:34:08,808 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.551e+01 2.727e+01 2.901e+01 3.751e+01, threshold=5.454e+01, percent-clipped=0.0 2023-12-21 15:34:10,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=105386.66666666667, ans=0.125 2023-12-21 15:34:18,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-21 15:34:43,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=15.0 2023-12-21 15:34:50,183 INFO [train.py:886] (3/4) Epoch 4, batch 1550, loss[loss=0.01781, audio_tagging_loss=0.01781, over 24750.00 frames. ], tot_loss[loss=0.01737, audio_tagging_loss=0.01737, over 4957759.74 frames. ], batch size: 99, lr: 2.56e-02, grad_scale: 256.0 2023-12-21 15:34:52,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105653.33333333333, ans=0.1 2023-12-21 15:35:06,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-12-21 15:35:11,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=105786.66666666667, ans=0.125 2023-12-21 15:35:15,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=105786.66666666667, ans=0.035 2023-12-21 15:35:20,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105853.33333333333, ans=0.1 2023-12-21 15:35:27,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=12.0 2023-12-21 15:35:41,574 INFO [train.py:886] (3/4) Epoch 4, batch 1600, loss[loss=0.01557, audio_tagging_loss=0.01557, over 23163.00 frames. ], tot_loss[loss=0.0174, audio_tagging_loss=0.0174, over 4952793.56 frames. ], batch size: 107, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:35:44,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=105986.66666666667, ans=0.125 2023-12-21 15:35:44,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=105986.66666666667, ans=0.0 2023-12-21 15:35:49,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.541e+01 2.790e+01 3.125e+01 4.127e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-21 15:35:56,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-21 15:36:12,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2023-12-21 15:36:23,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-12-21 15:36:25,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106253.33333333333, ans=0.125 2023-12-21 15:36:26,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=106253.33333333333, ans=0.125 2023-12-21 15:36:33,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=106320.0, ans=0.07 2023-12-21 15:36:33,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=12.0 2023-12-21 15:36:34,462 INFO [train.py:886] (3/4) Epoch 4, batch 1650, loss[loss=0.01557, audio_tagging_loss=0.01557, over 25000.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 4950968.69 frames. ], batch size: 100, lr: 2.56e-02, grad_scale: 128.0 2023-12-21 15:36:35,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.46 vs. limit=10.0 2023-12-21 15:37:04,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=106520.0, ans=0.0 2023-12-21 15:37:05,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-12-21 15:37:12,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.36 vs. limit=15.0 2023-12-21 15:37:14,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.99 vs. limit=22.5 2023-12-21 15:37:24,416 INFO [train.py:886] (3/4) Epoch 4, batch 1700, loss[loss=0.01668, audio_tagging_loss=0.01668, over 25000.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4953665.31 frames. ], batch size: 100, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:37:37,183 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.494e+01 2.744e+01 2.970e+01 4.279e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 15:37:50,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=106786.66666666667, ans=10.0 2023-12-21 15:38:18,948 INFO [train.py:886] (3/4) Epoch 4, batch 1750, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4957006.07 frames. ], batch size: 99, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:38:26,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.331e-01 2023-12-21 15:38:29,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=107053.33333333333, ans=0.125 2023-12-21 15:38:42,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=107120.0, ans=0.0 2023-12-21 15:38:47,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=107120.0, ans=0.04949747468305833 2023-12-21 15:38:51,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=107186.66666666667, ans=0.025 2023-12-21 15:39:00,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=107253.33333333333, ans=0.125 2023-12-21 15:39:03,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107253.33333333333, ans=0.1 2023-12-21 15:39:06,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107253.33333333333, ans=0.1 2023-12-21 15:39:09,758 INFO [train.py:886] (3/4) Epoch 4, batch 1800, loss[loss=0.01835, audio_tagging_loss=0.01835, over 21926.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 4953649.87 frames. ], batch size: 107, lr: 2.55e-02, grad_scale: 128.0 2023-12-21 15:39:19,655 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.593e+01 2.724e+01 2.959e+01 3.559e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 15:39:28,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-21 15:40:00,864 INFO [train.py:886] (3/4) Epoch 4, batch 1850, loss[loss=0.01847, audio_tagging_loss=0.01847, over 25000.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 4954615.57 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:40:05,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=107653.33333333333, ans=0.125 2023-12-21 15:40:09,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-21 15:40:13,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=107720.0, ans=0.125 2023-12-21 15:40:14,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=107720.0, ans=0.125 2023-12-21 15:40:30,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2023-12-21 15:40:32,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=107853.33333333333, ans=0.2 2023-12-21 15:40:39,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.95 vs. limit=10.0 2023-12-21 15:40:40,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=107920.0, ans=0.0 2023-12-21 15:40:40,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.31 vs. limit=10.0 2023-12-21 15:40:51,376 INFO [train.py:886] (3/4) Epoch 4, batch 1900, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24030.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 4949189.42 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:40:51,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=107986.66666666667, ans=0.0 2023-12-21 15:40:56,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107986.66666666667, ans=0.125 2023-12-21 15:41:00,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-21 15:41:00,679 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.677e+01 2.844e+01 3.083e+01 3.785e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-21 15:41:10,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=108120.0, ans=0.125 2023-12-21 15:41:17,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=108120.0, ans=0.125 2023-12-21 15:41:35,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=108253.33333333333, ans=0.025 2023-12-21 15:41:41,772 INFO [train.py:886] (3/4) Epoch 4, batch 1950, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01726, audio_tagging_loss=0.01726, over 4943961.89 frames. ], batch size: 99, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:41:42,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=108320.0, ans=0.95 2023-12-21 15:41:43,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=108320.0, ans=0.0 2023-12-21 15:41:46,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=108320.0, ans=0.2 2023-12-21 15:41:58,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=108386.66666666667, ans=0.125 2023-12-21 15:42:10,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=108453.33333333333, ans=0.1 2023-12-21 15:42:18,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=108520.0, ans=0.125 2023-12-21 15:42:27,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=108586.66666666667, ans=0.125 2023-12-21 15:42:27,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-21 15:42:33,628 INFO [train.py:886] (3/4) Epoch 4, batch 2000, loss[loss=0.01614, audio_tagging_loss=0.01614, over 25000.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 4940709.60 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 128.0 2023-12-21 15:42:33,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=108653.33333333333, ans=0.125 2023-12-21 15:42:37,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=108653.33333333333, ans=10.0 2023-12-21 15:42:42,111 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.508e+01 2.716e+01 2.994e+01 3.930e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 15:42:51,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=108720.0, ans=0.035 2023-12-21 15:43:00,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-12-21 15:43:24,461 INFO [train.py:886] (3/4) Epoch 4, batch 2050, loss[loss=0.0163, audio_tagging_loss=0.0163, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4940899.44 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:43:26,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2023-12-21 15:43:29,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=108986.66666666667, ans=0.035 2023-12-21 15:43:48,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=109120.0, ans=0.07 2023-12-21 15:44:13,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=109320.0, ans=0.5 2023-12-21 15:44:14,529 INFO [train.py:886] (3/4) Epoch 4, batch 2100, loss[loss=0.016, audio_tagging_loss=0.016, over 25000.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4941719.09 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:44:22,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109320.0, ans=0.1 2023-12-21 15:44:23,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.560e+01 2.764e+01 2.957e+01 4.648e+01, threshold=5.528e+01, percent-clipped=0.0 2023-12-21 15:44:24,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=109386.66666666667, ans=0.025 2023-12-21 15:44:29,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=109386.66666666667, ans=0.2 2023-12-21 15:44:40,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109453.33333333333, ans=0.1 2023-12-21 15:44:42,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=109453.33333333333, ans=0.2 2023-12-21 15:44:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=109520.0, ans=0.125 2023-12-21 15:44:54,213 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.798e+00 2023-12-21 15:45:02,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=109586.66666666667, ans=0.2 2023-12-21 15:45:03,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109586.66666666667, ans=0.125 2023-12-21 15:45:04,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-12-21 15:45:05,222 INFO [train.py:886] (3/4) Epoch 4, batch 2150, loss[loss=0.01668, audio_tagging_loss=0.01668, over 25000.00 frames. ], tot_loss[loss=0.01697, audio_tagging_loss=0.01697, over 4946556.80 frames. ], batch size: 100, lr: 2.53e-02, grad_scale: 128.0 2023-12-21 15:45:08,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109653.33333333333, ans=0.1 2023-12-21 15:45:10,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=109653.33333333333, ans=0.125 2023-12-21 15:45:14,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=109720.0, ans=0.0 2023-12-21 15:45:20,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=109720.0, ans=0.0 2023-12-21 15:45:20,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=109720.0, ans=0.2 2023-12-21 15:45:21,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.17 vs. limit=22.5 2023-12-21 15:45:27,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=109786.66666666667, ans=0.125 2023-12-21 15:45:29,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.42 vs. limit=22.5 2023-12-21 15:45:29,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109786.66666666667, ans=0.1 2023-12-21 15:45:32,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=109786.66666666667, ans=0.125 2023-12-21 15:45:37,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=109853.33333333333, ans=0.125 2023-12-21 15:45:55,487 INFO [train.py:886] (3/4) Epoch 4, batch 2200, loss[loss=0.01778, audio_tagging_loss=0.01778, over 24750.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 4942693.57 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:45:58,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=109986.66666666667, ans=0.125 2023-12-21 15:46:04,582 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.592e+01 2.698e+01 2.922e+01 4.235e+01, threshold=5.395e+01, percent-clipped=0.0 2023-12-21 15:46:05,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=110053.33333333333, ans=0.125 2023-12-21 15:46:09,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=15.0 2023-12-21 15:46:37,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=110253.33333333333, ans=0.125 2023-12-21 15:46:40,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=110253.33333333333, ans=0.125 2023-12-21 15:46:45,324 INFO [train.py:886] (3/4) Epoch 4, batch 2250, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4941235.92 frames. ], batch size: 99, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:47:00,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=110386.66666666667, ans=0.2 2023-12-21 15:47:03,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.90 vs. limit=22.5 2023-12-21 15:47:03,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110386.66666666667, ans=0.1 2023-12-21 15:47:04,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-12-21 15:47:16,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-12-21 15:47:25,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=110586.66666666667, ans=0.0 2023-12-21 15:47:27,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-12-21 15:47:37,168 INFO [train.py:886] (3/4) Epoch 4, batch 2300, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4945555.03 frames. ], batch size: 100, lr: 2.52e-02, grad_scale: 128.0 2023-12-21 15:47:37,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=110653.33333333333, ans=0.125 2023-12-21 15:47:45,881 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.543e+01 2.713e+01 2.924e+01 4.099e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 15:48:05,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=110786.66666666667, ans=0.0 2023-12-21 15:48:05,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110786.66666666667, ans=0.125 2023-12-21 15:48:10,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=110853.33333333333, ans=0.125 2023-12-21 15:48:12,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=110853.33333333333, ans=0.2 2023-12-21 15:48:15,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2023-12-21 15:48:25,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.01 vs. limit=15.0 2023-12-21 15:48:25,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=110920.0, ans=0.0 2023-12-21 15:48:27,601 INFO [train.py:886] (3/4) Epoch 4, batch 2350, loss[loss=0.01715, audio_tagging_loss=0.01715, over 25000.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4949212.08 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:48:38,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=111053.33333333333, ans=0.125 2023-12-21 15:48:38,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=111053.33333333333, ans=0.0 2023-12-21 15:48:40,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-12-21 15:48:54,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=111120.0, ans=0.125 2023-12-21 15:49:06,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=111186.66666666667, ans=0.2 2023-12-21 15:49:09,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=111253.33333333333, ans=0.125 2023-12-21 15:49:14,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=111253.33333333333, ans=0.125 2023-12-21 15:49:18,926 INFO [train.py:886] (3/4) Epoch 4, batch 2400, loss[loss=0.01766, audio_tagging_loss=0.01766, over 24750.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 4946895.72 frames. ], batch size: 99, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:49:27,412 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.500e+01 2.719e+01 2.953e+01 3.930e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 15:49:30,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=111386.66666666667, ans=0.125 2023-12-21 15:49:31,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2023-12-21 15:49:35,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2023-12-21 15:49:49,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=111520.0, ans=0.125 2023-12-21 15:49:50,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=111520.0, ans=0.0 2023-12-21 15:49:56,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=111520.0, ans=0.0 2023-12-21 15:50:00,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-21 15:50:05,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=111586.66666666667, ans=0.2 2023-12-21 15:50:07,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=111586.66666666667, ans=0.0 2023-12-21 15:50:08,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=111586.66666666667, ans=0.0 2023-12-21 15:50:11,408 INFO [train.py:886] (3/4) Epoch 4, batch 2450, loss[loss=0.02087, audio_tagging_loss=0.02087, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4943814.81 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 128.0 2023-12-21 15:50:13,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=111653.33333333333, ans=0.0 2023-12-21 15:50:16,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111653.33333333333, ans=0.125 2023-12-21 15:50:22,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=111720.0, ans=0.2 2023-12-21 15:50:23,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111720.0, ans=0.1 2023-12-21 15:50:24,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=111720.0, ans=0.125 2023-12-21 15:50:32,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=111786.66666666667, ans=0.125 2023-12-21 15:50:39,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=111786.66666666667, ans=0.1 2023-12-21 15:50:39,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=111786.66666666667, ans=0.0 2023-12-21 15:50:45,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=111853.33333333333, ans=0.05 2023-12-21 15:51:02,641 INFO [train.py:886] (3/4) Epoch 4, batch 2500, loss[loss=0.01843, audio_tagging_loss=0.01843, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4945570.21 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:51:03,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2023-12-21 15:51:12,016 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.590e+01 2.772e+01 2.981e+01 3.773e+01, threshold=5.543e+01, percent-clipped=0.0 2023-12-21 15:51:21,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=112053.33333333333, ans=0.125 2023-12-21 15:51:31,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=112120.0, ans=0.125 2023-12-21 15:51:41,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=112186.66666666667, ans=0.125 2023-12-21 15:51:48,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=112253.33333333333, ans=0.125 2023-12-21 15:51:49,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=112253.33333333333, ans=0.2 2023-12-21 15:51:53,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-12-21 15:51:56,356 INFO [train.py:886] (3/4) Epoch 4, batch 2550, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4939661.40 frames. ], batch size: 99, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:52:10,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=112386.66666666667, ans=0.125 2023-12-21 15:52:11,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.54 vs. limit=22.5 2023-12-21 15:52:25,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-12-21 15:52:27,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=112520.0, ans=0.125 2023-12-21 15:52:29,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=112520.0, ans=0.0 2023-12-21 15:52:47,835 INFO [train.py:886] (3/4) Epoch 4, batch 2600, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4944746.33 frames. ], batch size: 100, lr: 2.50e-02, grad_scale: 128.0 2023-12-21 15:52:57,732 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.494e+01 2023-12-21 15:52:58,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.608e+01 2.784e+01 3.013e+01 3.853e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-21 15:53:12,971 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.296e+00 2023-12-21 15:53:21,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=112853.33333333333, ans=10.0 2023-12-21 15:53:24,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=112853.33333333333, ans=0.0 2023-12-21 15:53:40,007 INFO [train.py:886] (3/4) Epoch 4, batch 2650, loss[loss=0.01592, audio_tagging_loss=0.01592, over 23969.00 frames. ], tot_loss[loss=0.01714, audio_tagging_loss=0.01714, over 4940173.03 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:53:45,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=112986.66666666667, ans=0.125 2023-12-21 15:53:46,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2023-12-21 15:53:51,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=113053.33333333333, ans=0.2 2023-12-21 15:54:14,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=113186.66666666667, ans=0.035 2023-12-21 15:54:14,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2023-12-21 15:54:19,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=113253.33333333333, ans=0.0 2023-12-21 15:54:31,516 INFO [train.py:886] (3/4) Epoch 4, batch 2700, loss[loss=0.0201, audio_tagging_loss=0.0201, over 24750.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4947125.20 frames. ], batch size: 99, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:54:37,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=113320.0, ans=0.07 2023-12-21 15:54:40,198 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.542e+01 2.756e+01 2.979e+01 4.286e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 15:54:49,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=113453.33333333333, ans=0.125 2023-12-21 15:55:20,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=113653.33333333333, ans=0.025 2023-12-21 15:55:21,454 INFO [train.py:886] (3/4) Epoch 4, batch 2750, loss[loss=0.01746, audio_tagging_loss=0.01746, over 25000.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 4957221.51 frames. ], batch size: 100, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:55:23,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=113653.33333333333, ans=0.95 2023-12-21 15:55:52,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=113853.33333333333, ans=0.125 2023-12-21 15:55:56,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113853.33333333333, ans=0.0 2023-12-21 15:56:01,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.73 vs. limit=10.0 2023-12-21 15:56:13,778 INFO [train.py:886] (3/4) Epoch 4, batch 2800, loss[loss=0.01799, audio_tagging_loss=0.01799, over 24750.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 4954995.68 frames. ], batch size: 99, lr: 2.49e-02, grad_scale: 128.0 2023-12-21 15:56:22,307 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.780e+01 3.121e+01 4.159e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 15:56:48,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.70 vs. limit=15.0 2023-12-21 15:56:52,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=114186.66666666667, ans=0.0 2023-12-21 15:56:55,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=114253.33333333333, ans=0.125 2023-12-21 15:57:03,909 INFO [train.py:886] (3/4) Epoch 4, batch 2850, loss[loss=0.01775, audio_tagging_loss=0.01775, over 24750.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4945835.07 frames. ], batch size: 99, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:57:06,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-21 15:57:24,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114453.33333333333, ans=0.1 2023-12-21 15:57:55,981 INFO [train.py:886] (3/4) Epoch 4, batch 2900, loss[loss=0.0201, audio_tagging_loss=0.0201, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4949845.12 frames. ], batch size: 99, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:58:04,683 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.617e+01 2.786e+01 3.010e+01 3.894e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-21 15:58:07,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=114720.0, ans=0.0 2023-12-21 15:58:16,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-21 15:58:19,177 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 15:58:23,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-21 15:58:23,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=114786.66666666667, ans=0.05 2023-12-21 15:58:29,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=114853.33333333333, ans=0.125 2023-12-21 15:58:30,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-12-21 15:58:31,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=114853.33333333333, ans=0.05 2023-12-21 15:58:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=114920.0, ans=0.0 2023-12-21 15:58:45,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=114920.0, ans=0.2 2023-12-21 15:58:48,193 INFO [train.py:886] (3/4) Epoch 4, batch 2950, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4949212.35 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 128.0 2023-12-21 15:59:06,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.19 vs. limit=6.0 2023-12-21 15:59:09,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=115120.0, ans=0.125 2023-12-21 15:59:36,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2023-12-21 15:59:38,321 INFO [train.py:886] (3/4) Epoch 4, batch 3000, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4952976.02 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 15:59:38,322 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 15:59:59,359 INFO [train.py:917] (3/4) Epoch 4, validation: loss=0.04177, audio_tagging_loss=0.04177, over 3737520.00 frames. 2023-12-21 15:59:59,360 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 16:00:00,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115320.0, ans=0.1 2023-12-21 16:00:05,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=115320.0, ans=0.125 2023-12-21 16:00:07,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.539e+01 2.719e+01 2.990e+01 3.720e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:00:11,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115386.66666666667, ans=0.1 2023-12-21 16:00:18,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-12-21 16:00:24,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=115453.33333333333, ans=0.125 2023-12-21 16:00:26,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=115453.33333333333, ans=0.0 2023-12-21 16:00:51,249 INFO [train.py:886] (3/4) Epoch 4, batch 3050, loss[loss=0.01719, audio_tagging_loss=0.01719, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4958520.60 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:00:59,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=115653.33333333333, ans=0.0 2023-12-21 16:01:22,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=115853.33333333333, ans=0.125 2023-12-21 16:01:40,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=115920.0, ans=0.125 2023-12-21 16:01:42,101 INFO [train.py:886] (3/4) Epoch 4, batch 3100, loss[loss=0.01783, audio_tagging_loss=0.01783, over 24750.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4952627.18 frames. ], batch size: 99, lr: 2.47e-02, grad_scale: 128.0 2023-12-21 16:01:46,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=115986.66666666667, ans=0.0 2023-12-21 16:01:52,025 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.579e+01 2.744e+01 2.909e+01 3.915e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 16:02:14,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=116186.66666666667, ans=0.1 2023-12-21 16:02:22,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=31.71 vs. limit=15.0 2023-12-21 16:02:34,952 INFO [train.py:886] (3/4) Epoch 4, batch 3150, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 4952747.58 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:02:39,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=116320.0, ans=0.025 2023-12-21 16:03:20,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=116586.66666666667, ans=0.125 2023-12-21 16:03:26,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116653.33333333333, ans=0.1 2023-12-21 16:03:27,440 INFO [train.py:886] (3/4) Epoch 4, batch 3200, loss[loss=0.0179, audio_tagging_loss=0.0179, over 24750.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4950245.61 frames. ], batch size: 99, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:03:36,017 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.555e+01 2.782e+01 3.048e+01 4.020e+01, threshold=5.565e+01, percent-clipped=0.0 2023-12-21 16:03:38,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=116720.0, ans=0.125 2023-12-21 16:04:04,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=116853.33333333333, ans=0.125 2023-12-21 16:04:11,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=116920.0, ans=0.0 2023-12-21 16:04:13,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=116920.0, ans=0.125 2023-12-21 16:04:18,489 INFO [train.py:886] (3/4) Epoch 4, batch 3250, loss[loss=0.01836, audio_tagging_loss=0.01836, over 25000.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4945899.62 frames. ], batch size: 100, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:04:21,579 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:04:47,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=117120.0, ans=0.125 2023-12-21 16:04:47,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2023-12-21 16:04:54,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=117186.66666666667, ans=12.0 2023-12-21 16:04:58,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117186.66666666667, ans=0.1 2023-12-21 16:05:10,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=117320.0, ans=0.0 2023-12-21 16:05:11,464 INFO [train.py:886] (3/4) Epoch 4, batch 3300, loss[loss=0.01741, audio_tagging_loss=0.01741, over 25000.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 4951145.96 frames. ], batch size: 100, lr: 2.46e-02, grad_scale: 128.0 2023-12-21 16:05:17,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117320.0, ans=0.1 2023-12-21 16:05:20,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.625e+01 2.805e+01 3.075e+01 3.853e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:05:43,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-12-21 16:05:49,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=12.0 2023-12-21 16:05:54,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=117586.66666666667, ans=0.0 2023-12-21 16:05:56,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=117586.66666666667, ans=0.125 2023-12-21 16:06:00,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-12-21 16:06:03,551 INFO [train.py:886] (3/4) Epoch 4, batch 3350, loss[loss=0.0156, audio_tagging_loss=0.0156, over 24750.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4946511.78 frames. ], batch size: 99, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:06:08,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-12-21 16:06:16,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=117720.0, ans=0.0 2023-12-21 16:06:44,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=117920.0, ans=0.125 2023-12-21 16:06:47,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2023-12-21 16:06:50,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117920.0, ans=0.1 2023-12-21 16:06:54,561 INFO [train.py:886] (3/4) Epoch 4, batch 3400, loss[loss=0.01698, audio_tagging_loss=0.01698, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4945436.52 frames. ], batch size: 100, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:07:03,686 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.590e+01 2.740e+01 3.014e+01 4.535e+01, threshold=5.480e+01, percent-clipped=0.0 2023-12-21 16:07:40,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=118253.33333333333, ans=0.04949747468305833 2023-12-21 16:07:47,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=118320.0, ans=0.125 2023-12-21 16:07:47,979 INFO [train.py:886] (3/4) Epoch 4, batch 3450, loss[loss=0.01891, audio_tagging_loss=0.01891, over 24750.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 4940758.07 frames. ], batch size: 99, lr: 2.45e-02, grad_scale: 128.0 2023-12-21 16:07:50,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=118320.0, ans=0.125 2023-12-21 16:07:56,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-12-21 16:08:04,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-12-21 16:08:09,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2023-12-21 16:08:22,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=118520.0, ans=0.125 2023-12-21 16:08:33,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=118586.66666666667, ans=0.125 2023-12-21 16:08:38,278 INFO [train.py:886] (3/4) Epoch 4, batch 3500, loss[loss=0.01517, audio_tagging_loss=0.01517, over 24750.00 frames. ], tot_loss[loss=0.01713, audio_tagging_loss=0.01713, over 4935988.99 frames. ], batch size: 99, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:08:48,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2023-12-21 16:08:48,898 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.564e+01 2.726e+01 3.075e+01 4.208e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:08:50,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=118720.0, ans=0.125 2023-12-21 16:09:04,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118786.66666666667, ans=0.1 2023-12-21 16:09:05,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=118786.66666666667, ans=0.125 2023-12-21 16:09:08,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-12-21 16:09:09,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-12-21 16:09:23,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.05 vs. limit=22.5 2023-12-21 16:09:24,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=118920.0, ans=0.125 2023-12-21 16:09:28,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=118920.0, ans=0.07 2023-12-21 16:09:30,643 INFO [train.py:886] (3/4) Epoch 4, batch 3550, loss[loss=0.01957, audio_tagging_loss=0.01957, over 25000.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 4940960.06 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:09:30,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=118986.66666666667, ans=10.0 2023-12-21 16:09:45,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=119053.33333333333, ans=0.07 2023-12-21 16:09:56,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=119120.0, ans=0.125 2023-12-21 16:09:59,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=119120.0, ans=0.0 2023-12-21 16:10:00,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=119120.0, ans=0.0 2023-12-21 16:10:04,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119186.66666666667, ans=0.125 2023-12-21 16:10:22,228 INFO [train.py:886] (3/4) Epoch 4, batch 3600, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 4941126.02 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 128.0 2023-12-21 16:10:29,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=119320.0, ans=0.0 2023-12-21 16:10:32,448 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.504e+01 2.701e+01 2.952e+01 4.327e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 16:10:36,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-21 16:10:46,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=119453.33333333333, ans=15.0 2023-12-21 16:10:59,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=119520.0, ans=0.125 2023-12-21 16:11:02,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=119586.66666666667, ans=0.0 2023-12-21 16:11:12,798 INFO [train.py:886] (3/4) Epoch 4, batch 3650, loss[loss=0.02014, audio_tagging_loss=0.02014, over 25000.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4943347.29 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:11:25,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=119720.0, ans=0.0 2023-12-21 16:11:36,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=119786.66666666667, ans=0.0 2023-12-21 16:11:45,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=119853.33333333333, ans=0.0 2023-12-21 16:11:48,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119853.33333333333, ans=0.125 2023-12-21 16:11:51,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=119853.33333333333, ans=0.0 2023-12-21 16:11:57,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=119920.0, ans=0.125 2023-12-21 16:12:02,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=119920.0, ans=0.0 2023-12-21 16:12:04,965 INFO [train.py:886] (3/4) Epoch 4, batch 3700, loss[loss=0.02217, audio_tagging_loss=0.02217, over 25000.00 frames. ], tot_loss[loss=0.01706, audio_tagging_loss=0.01706, over 4952070.89 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:12:13,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=15.0 2023-12-21 16:12:14,755 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.607e+01 2.781e+01 3.058e+01 3.851e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 16:12:16,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=15.0 2023-12-21 16:12:22,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.70 vs. limit=10.0 2023-12-21 16:12:52,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=120253.33333333333, ans=0.1 2023-12-21 16:12:55,110 INFO [train.py:886] (3/4) Epoch 4, batch 3750, loss[loss=0.01837, audio_tagging_loss=0.01837, over 24750.00 frames. ], tot_loss[loss=0.0172, audio_tagging_loss=0.0172, over 4954433.55 frames. ], batch size: 99, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:13:01,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-12-21 16:13:18,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=120453.33333333333, ans=0.0 2023-12-21 16:13:28,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.41 vs. limit=10.0 2023-12-21 16:13:39,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-12-21 16:13:46,564 INFO [train.py:886] (3/4) Epoch 4, batch 3800, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.01724, audio_tagging_loss=0.01724, over 4948152.85 frames. ], batch size: 100, lr: 2.43e-02, grad_scale: 128.0 2023-12-21 16:13:56,036 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.567e+01 2.797e+01 3.040e+01 4.165e+01, threshold=5.595e+01, percent-clipped=0.0 2023-12-21 16:14:00,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-12-21 16:14:04,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=120720.0, ans=0.125 2023-12-21 16:14:33,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=120920.0, ans=0.125 2023-12-21 16:14:38,126 INFO [train.py:886] (3/4) Epoch 4, batch 3850, loss[loss=0.01974, audio_tagging_loss=0.01974, over 25000.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 4948619.17 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:14:44,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=28.16 vs. limit=15.0 2023-12-21 16:14:54,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=121053.33333333333, ans=0.125 2023-12-21 16:15:02,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=121120.0, ans=22.5 2023-12-21 16:15:03,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-12-21 16:15:06,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121120.0, ans=0.125 2023-12-21 16:15:06,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=121120.0, ans=0.125 2023-12-21 16:15:18,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=121253.33333333333, ans=0.125 2023-12-21 16:15:21,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=121253.33333333333, ans=0.2 2023-12-21 16:15:28,374 INFO [train.py:886] (3/4) Epoch 4, batch 3900, loss[loss=0.01489, audio_tagging_loss=0.01489, over 23988.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4946936.47 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:15:28,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-12-21 16:15:37,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=121320.0, ans=0.125 2023-12-21 16:15:37,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.46 vs. limit=22.5 2023-12-21 16:15:39,372 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.569e+01 2.731e+01 2.970e+01 3.861e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 16:15:44,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2023-12-21 16:16:06,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-12-21 16:16:09,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=121586.66666666667, ans=0.0 2023-12-21 16:16:10,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=121586.66666666667, ans=0.0 2023-12-21 16:16:19,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=121586.66666666667, ans=0.0 2023-12-21 16:16:21,507 INFO [train.py:886] (3/4) Epoch 4, batch 3950, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.0169, audio_tagging_loss=0.0169, over 4953623.74 frames. ], batch size: 100, lr: 2.42e-02, grad_scale: 128.0 2023-12-21 16:16:26,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=121653.33333333333, ans=0.125 2023-12-21 16:16:30,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=121720.0, ans=0.125 2023-12-21 16:16:45,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=121786.66666666667, ans=0.0 2023-12-21 16:16:46,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=121786.66666666667, ans=0.2 2023-12-21 16:16:56,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=121853.33333333333, ans=10.0 2023-12-21 16:17:07,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-12-21 16:17:12,383 INFO [train.py:886] (3/4) Epoch 4, batch 4000, loss[loss=0.015, audio_tagging_loss=0.015, over 23999.00 frames. ], tot_loss[loss=0.01694, audio_tagging_loss=0.01694, over 4953264.75 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:17:20,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-12-21 16:17:23,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.583e+01 2.753e+01 2.897e+01 3.593e+01, threshold=5.506e+01, percent-clipped=0.0 2023-12-21 16:17:23,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=122053.33333333333, ans=0.125 2023-12-21 16:17:29,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=122053.33333333333, ans=0.0 2023-12-21 16:17:44,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-21 16:17:44,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=122186.66666666667, ans=0.0 2023-12-21 16:17:44,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-12-21 16:18:03,842 INFO [train.py:886] (3/4) Epoch 4, batch 4050, loss[loss=0.0161, audio_tagging_loss=0.0161, over 24750.00 frames. ], tot_loss[loss=0.01705, audio_tagging_loss=0.01705, over 4946680.67 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:18:12,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=122386.66666666667, ans=0.0 2023-12-21 16:18:16,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=122386.66666666667, ans=15.0 2023-12-21 16:18:36,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=122520.0, ans=0.125 2023-12-21 16:18:43,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=122586.66666666667, ans=0.125 2023-12-21 16:18:49,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=122586.66666666667, ans=0.125 2023-12-21 16:18:55,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2023-12-21 16:18:56,103 INFO [train.py:886] (3/4) Epoch 4, batch 4100, loss[loss=0.01734, audio_tagging_loss=0.01734, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 4939083.28 frames. ], batch size: 99, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:19:06,548 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.561e+01 2.807e+01 3.053e+01 3.767e+01, threshold=5.614e+01, percent-clipped=0.0 2023-12-21 16:19:11,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122720.0, ans=0.1 2023-12-21 16:19:15,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122786.66666666667, ans=0.125 2023-12-21 16:19:38,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=122920.0, ans=0.2 2023-12-21 16:19:47,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2023-12-21 16:19:47,654 INFO [train.py:886] (3/4) Epoch 4, batch 4150, loss[loss=0.01851, audio_tagging_loss=0.01851, over 25000.00 frames. ], tot_loss[loss=0.01718, audio_tagging_loss=0.01718, over 4940352.12 frames. ], batch size: 100, lr: 2.41e-02, grad_scale: 128.0 2023-12-21 16:20:08,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=123120.0, ans=0.125 2023-12-21 16:20:24,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=123186.66666666667, ans=0.125 2023-12-21 16:20:27,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=123186.66666666667, ans=0.125 2023-12-21 16:20:27,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.18 vs. limit=22.5 2023-12-21 16:20:33,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=123253.33333333333, ans=0.0 2023-12-21 16:20:34,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=123253.33333333333, ans=0.125 2023-12-21 16:20:34,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=123253.33333333333, ans=0.0 2023-12-21 16:20:35,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-12-21 16:20:35,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-12-21 16:20:40,632 INFO [train.py:886] (3/4) Epoch 4, batch 4200, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01707, audio_tagging_loss=0.01707, over 4947528.97 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:20:44,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=123320.0, ans=0.0 2023-12-21 16:20:44,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-21 16:20:49,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=123386.66666666667, ans=0.125 2023-12-21 16:20:50,050 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.575e+01 2.802e+01 3.033e+01 3.875e+01, threshold=5.604e+01, percent-clipped=0.0 2023-12-21 16:21:05,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=123453.33333333333, ans=0.0 2023-12-21 16:21:18,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=123520.0, ans=0.2 2023-12-21 16:21:19,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123520.0, ans=0.1 2023-12-21 16:21:23,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=123586.66666666667, ans=0.125 2023-12-21 16:21:32,183 INFO [train.py:886] (3/4) Epoch 4, batch 4250, loss[loss=0.01692, audio_tagging_loss=0.01692, over 25000.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 4945396.73 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:21:54,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=123786.66666666667, ans=0.125 2023-12-21 16:21:58,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123786.66666666667, ans=0.125 2023-12-21 16:22:14,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=123920.0, ans=0.04949747468305833 2023-12-21 16:22:23,664 INFO [train.py:886] (3/4) Epoch 4, batch 4300, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4952952.41 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:22:33,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.616e+01 2.824e+01 3.044e+01 4.145e+01, threshold=5.649e+01, percent-clipped=0.0 2023-12-21 16:22:34,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=74.23 vs. limit=15.0 2023-12-21 16:22:42,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=124053.33333333333, ans=0.0 2023-12-21 16:22:56,056 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.383e+00 2023-12-21 16:23:00,688 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.924e+00 2023-12-21 16:23:01,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-12-21 16:23:15,655 INFO [train.py:886] (3/4) Epoch 4, batch 4350, loss[loss=0.02054, audio_tagging_loss=0.02054, over 25000.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 4956394.53 frames. ], batch size: 100, lr: 2.40e-02, grad_scale: 128.0 2023-12-21 16:23:15,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124320.0, ans=0.0 2023-12-21 16:23:17,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=124320.0, ans=0.0 2023-12-21 16:23:20,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2023-12-21 16:23:22,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=124320.0, ans=0.125 2023-12-21 16:23:25,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124386.66666666667, ans=0.125 2023-12-21 16:23:34,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=124453.33333333333, ans=15.0 2023-12-21 16:23:56,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=124586.66666666667, ans=0.125 2023-12-21 16:24:01,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=124586.66666666667, ans=0.125 2023-12-21 16:24:04,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.05 vs. limit=22.5 2023-12-21 16:24:07,052 INFO [train.py:886] (3/4) Epoch 4, batch 4400, loss[loss=0.01835, audio_tagging_loss=0.01835, over 24750.00 frames. ], tot_loss[loss=0.01729, audio_tagging_loss=0.01729, over 4948696.02 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:24:17,861 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.690e+01 2.854e+01 3.099e+01 4.055e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-21 16:24:20,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124720.0, ans=0.1 2023-12-21 16:24:28,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124786.66666666667, ans=0.1 2023-12-21 16:24:33,777 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:24:47,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=15.0 2023-12-21 16:24:52,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=124920.0, ans=0.125 2023-12-21 16:24:54,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=124920.0, ans=0.0 2023-12-21 16:24:58,565 INFO [train.py:886] (3/4) Epoch 4, batch 4450, loss[loss=0.01737, audio_tagging_loss=0.01737, over 24750.00 frames. ], tot_loss[loss=0.01719, audio_tagging_loss=0.01719, over 4944983.84 frames. ], batch size: 99, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:25:07,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-21 16:25:14,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-21 16:25:15,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=125053.33333333333, ans=0.125 2023-12-21 16:25:26,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125120.0, ans=0.1 2023-12-21 16:25:51,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2023-12-21 16:25:51,700 INFO [train.py:886] (3/4) Epoch 4, batch 4500, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 4942889.07 frames. ], batch size: 100, lr: 2.39e-02, grad_scale: 128.0 2023-12-21 16:26:01,675 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.563e+01 2.777e+01 3.038e+01 3.654e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 16:26:23,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.15 vs. limit=22.5 2023-12-21 16:26:33,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=125586.66666666667, ans=0.125 2023-12-21 16:26:38,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=125586.66666666667, ans=0.125 2023-12-21 16:26:39,649 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.665e+00 2023-12-21 16:26:42,378 INFO [train.py:886] (3/4) Epoch 4, batch 4550, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.017, audio_tagging_loss=0.017, over 4942687.42 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:26:47,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=125653.33333333333, ans=0.125 2023-12-21 16:26:59,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=125720.0, ans=15.0 2023-12-21 16:27:02,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.88 vs. limit=22.5 2023-12-21 16:27:22,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=125853.33333333333, ans=0.125 2023-12-21 16:27:26,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=125920.0, ans=0.0 2023-12-21 16:27:35,638 INFO [train.py:886] (3/4) Epoch 4, batch 4600, loss[loss=0.01831, audio_tagging_loss=0.01831, over 24750.00 frames. ], tot_loss[loss=0.01702, audio_tagging_loss=0.01702, over 4944917.27 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:27:38,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=125986.66666666667, ans=0.0 2023-12-21 16:27:38,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=125986.66666666667, ans=0.125 2023-12-21 16:27:45,210 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.614e+01 2.813e+01 2.992e+01 3.999e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 16:27:55,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2023-12-21 16:27:55,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-21 16:28:05,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126120.0, ans=0.1 2023-12-21 16:28:09,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-21 16:28:15,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=126253.33333333333, ans=0.0 2023-12-21 16:28:18,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=126253.33333333333, ans=0.125 2023-12-21 16:28:26,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=126320.0, ans=0.125 2023-12-21 16:28:27,583 INFO [train.py:886] (3/4) Epoch 4, batch 4650, loss[loss=0.0186, audio_tagging_loss=0.0186, over 24750.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4948373.18 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:28:37,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126386.66666666667, ans=0.1 2023-12-21 16:28:45,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126386.66666666667, ans=0.1 2023-12-21 16:29:05,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=126520.0, ans=0.125 2023-12-21 16:29:18,448 INFO [train.py:886] (3/4) Epoch 4, batch 4700, loss[loss=0.01448, audio_tagging_loss=0.01448, over 24750.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 4942809.17 frames. ], batch size: 99, lr: 2.38e-02, grad_scale: 128.0 2023-12-21 16:29:27,723 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.637e+01 2.864e+01 3.107e+01 3.954e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-21 16:29:32,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=126720.0, ans=0.125 2023-12-21 16:29:47,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=126853.33333333333, ans=0.125 2023-12-21 16:29:53,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2023-12-21 16:29:57,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=126920.0, ans=0.1 2023-12-21 16:30:00,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=15.0 2023-12-21 16:30:01,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126920.0, ans=0.1 2023-12-21 16:30:02,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=126920.0, ans=0.125 2023-12-21 16:30:05,315 INFO [train.py:886] (3/4) Epoch 4, batch 4750, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24750.00 frames. ], tot_loss[loss=0.0171, audio_tagging_loss=0.0171, over 4944944.77 frames. ], batch size: 99, lr: 2.37e-02, grad_scale: 128.0 2023-12-21 16:30:12,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=126986.66666666667, ans=0.125 2023-12-21 16:30:42,774 INFO [train.py:886] (3/4) Epoch 5, batch 0, loss[loss=0.03601, audio_tagging_loss=0.03601, over 25000.00 frames. ], tot_loss[loss=0.03601, audio_tagging_loss=0.03601, over 25000.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:30:42,775 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 16:31:04,467 INFO [train.py:917] (3/4) Epoch 5, validation: loss=0.03772, audio_tagging_loss=0.03772, over 3737520.00 frames. 2023-12-21 16:31:04,468 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 16:31:11,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=127093.33333333333, ans=0.0 2023-12-21 16:31:17,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=127160.0, ans=0.125 2023-12-21 16:31:21,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=127160.0, ans=15.0 2023-12-21 16:31:22,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2023-12-21 16:31:30,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-12-21 16:31:33,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=127293.33333333333, ans=0.125 2023-12-21 16:31:36,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=127293.33333333333, ans=0.125 2023-12-21 16:31:44,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-12-21 16:31:47,042 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.748e+01 3.184e+01 3.727e+01 1.037e+02, threshold=6.368e+01, percent-clipped=5.0 2023-12-21 16:31:52,723 INFO [train.py:886] (3/4) Epoch 5, batch 50, loss[loss=0.0223, audio_tagging_loss=0.0223, over 25000.00 frames. ], tot_loss[loss=0.02703, audio_tagging_loss=0.02703, over 1122377.94 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 128.0 2023-12-21 16:31:52,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=127426.66666666667, ans=0.125 2023-12-21 16:31:56,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.46 vs. limit=22.5 2023-12-21 16:32:07,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-12-21 16:32:13,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=127560.0, ans=0.125 2023-12-21 16:32:27,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=127626.66666666667, ans=0.035 2023-12-21 16:32:30,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=127626.66666666667, ans=0.2 2023-12-21 16:32:32,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-12-21 16:32:37,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127693.33333333333, ans=0.125 2023-12-21 16:32:43,320 INFO [train.py:886] (3/4) Epoch 5, batch 100, loss[loss=0.01742, audio_tagging_loss=0.01742, over 22644.00 frames. ], tot_loss[loss=0.02304, audio_tagging_loss=0.02304, over 1970608.49 frames. ], batch size: 107, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:32:54,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=127826.66666666667, ans=0.2 2023-12-21 16:32:59,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2023-12-21 16:33:20,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=127960.0, ans=0.125 2023-12-21 16:33:26,370 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.735e+01 2.959e+01 3.135e+01 3.807e+01, threshold=5.918e+01, percent-clipped=0.0 2023-12-21 16:33:32,078 INFO [train.py:886] (3/4) Epoch 5, batch 150, loss[loss=0.01699, audio_tagging_loss=0.01699, over 25000.00 frames. ], tot_loss[loss=0.02094, audio_tagging_loss=0.02094, over 2640367.10 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:33:38,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128093.33333333333, ans=0.125 2023-12-21 16:33:43,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=128160.0, ans=0.0 2023-12-21 16:33:47,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-21 16:33:50,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=128160.0, ans=0.2 2023-12-21 16:33:58,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-12-21 16:34:06,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=128293.33333333333, ans=0.0 2023-12-21 16:34:23,656 INFO [train.py:886] (3/4) Epoch 5, batch 200, loss[loss=0.01948, audio_tagging_loss=0.01948, over 24750.00 frames. ], tot_loss[loss=0.01962, audio_tagging_loss=0.01962, over 3157889.08 frames. ], batch size: 99, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:34:33,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=128493.33333333333, ans=0.0 2023-12-21 16:34:35,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=128493.33333333333, ans=0.125 2023-12-21 16:34:44,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=128560.0, ans=0.2 2023-12-21 16:35:07,585 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.571e+01 2.693e+01 2.979e+01 3.922e+01, threshold=5.386e+01, percent-clipped=0.0 2023-12-21 16:35:07,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=128693.33333333333, ans=0.0 2023-12-21 16:35:13,419 INFO [train.py:886] (3/4) Epoch 5, batch 250, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01881, audio_tagging_loss=0.01881, over 3559473.19 frames. ], batch size: 100, lr: 2.20e-02, grad_scale: 128.0 2023-12-21 16:35:37,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2023-12-21 16:35:43,650 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:35:54,252 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.989e-01 2023-12-21 16:35:56,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=129026.66666666667, ans=0.125 2023-12-21 16:35:58,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=129026.66666666667, ans=0.125 2023-12-21 16:36:04,545 INFO [train.py:886] (3/4) Epoch 5, batch 300, loss[loss=0.02062, audio_tagging_loss=0.02062, over 24750.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 3867785.73 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:36:12,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=129093.33333333333, ans=0.125 2023-12-21 16:36:25,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129226.66666666667, ans=0.125 2023-12-21 16:36:50,050 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.527e+01 2.747e+01 2.947e+01 3.578e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-21 16:36:55,757 INFO [train.py:886] (3/4) Epoch 5, batch 350, loss[loss=0.01747, audio_tagging_loss=0.01747, over 24750.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 4100697.60 frames. ], batch size: 99, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:36:59,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=129426.66666666667, ans=0.2 2023-12-21 16:37:05,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=129493.33333333333, ans=0.125 2023-12-21 16:37:09,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=129493.33333333333, ans=0.125 2023-12-21 16:37:10,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-12-21 16:37:13,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=129493.33333333333, ans=0.09899494936611666 2023-12-21 16:37:16,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=129560.0, ans=0.125 2023-12-21 16:37:24,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=129560.0, ans=0.125 2023-12-21 16:37:35,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-12-21 16:37:46,204 INFO [train.py:886] (3/4) Epoch 5, batch 400, loss[loss=0.0196, audio_tagging_loss=0.0196, over 22247.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 4285064.21 frames. ], batch size: 107, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:37:52,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=15.0 2023-12-21 16:38:13,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=129893.33333333333, ans=22.5 2023-12-21 16:38:31,456 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.542e+01 2.726e+01 2.924e+01 4.010e+01, threshold=5.453e+01, percent-clipped=0.0 2023-12-21 16:38:37,874 INFO [train.py:886] (3/4) Epoch 5, batch 450, loss[loss=0.01621, audio_tagging_loss=0.01621, over 22038.00 frames. ], tot_loss[loss=0.01739, audio_tagging_loss=0.01739, over 4431394.55 frames. ], batch size: 107, lr: 2.19e-02, grad_scale: 128.0 2023-12-21 16:38:38,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=130093.33333333333, ans=0.0 2023-12-21 16:38:39,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=130093.33333333333, ans=10.0 2023-12-21 16:38:44,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=130093.33333333333, ans=0.125 2023-12-21 16:38:51,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=130160.0, ans=0.2 2023-12-21 16:38:57,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=130226.66666666667, ans=0.125 2023-12-21 16:39:12,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=130293.33333333333, ans=0.0 2023-12-21 16:39:14,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130293.33333333333, ans=0.1 2023-12-21 16:39:14,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.49 vs. limit=22.5 2023-12-21 16:39:17,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=130360.0, ans=0.0 2023-12-21 16:39:18,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=130360.0, ans=0.2 2023-12-21 16:39:28,824 INFO [train.py:886] (3/4) Epoch 5, batch 500, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 4549113.68 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:39:34,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130426.66666666667, ans=0.1 2023-12-21 16:39:34,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-12-21 16:39:36,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=130426.66666666667, ans=0.125 2023-12-21 16:39:42,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=130493.33333333333, ans=0.0 2023-12-21 16:39:46,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=130493.33333333333, ans=0.0 2023-12-21 16:39:46,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=15.0 2023-12-21 16:39:47,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.94 vs. limit=22.5 2023-12-21 16:39:53,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=130560.0, ans=10.0 2023-12-21 16:40:08,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-12-21 16:40:09,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-21 16:40:11,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=130693.33333333333, ans=0.0 2023-12-21 16:40:13,607 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.467e+01 2.664e+01 2.880e+01 3.369e+01, threshold=5.329e+01, percent-clipped=0.0 2023-12-21 16:40:17,731 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.228e+01 2023-12-21 16:40:19,465 INFO [train.py:886] (3/4) Epoch 5, batch 550, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.01698, audio_tagging_loss=0.01698, over 4642057.47 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:40:26,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130760.0, ans=0.1 2023-12-21 16:40:28,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=130826.66666666667, ans=0.2 2023-12-21 16:40:32,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 16:40:34,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130826.66666666667, ans=0.1 2023-12-21 16:40:42,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=130893.33333333333, ans=0.0 2023-12-21 16:40:50,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=130960.0, ans=0.125 2023-12-21 16:40:51,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=130960.0, ans=0.0 2023-12-21 16:41:07,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-12-21 16:41:10,427 INFO [train.py:886] (3/4) Epoch 5, batch 600, loss[loss=0.01649, audio_tagging_loss=0.01649, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4707955.78 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:41:12,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.70 vs. limit=22.5 2023-12-21 16:41:33,765 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.845e+00 2023-12-21 16:41:33,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=131226.66666666666, ans=0.0 2023-12-21 16:41:47,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-12-21 16:41:53,918 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.543e+01 2.791e+01 2.900e+01 3.878e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 16:42:00,248 INFO [train.py:886] (3/4) Epoch 5, batch 650, loss[loss=0.01814, audio_tagging_loss=0.01814, over 25000.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4760269.55 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:01,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=131426.66666666666, ans=0.07 2023-12-21 16:42:05,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=15.0 2023-12-21 16:42:14,642 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.617e+00 2023-12-21 16:42:14,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=131493.33333333334, ans=0.125 2023-12-21 16:42:33,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=15.0 2023-12-21 16:42:38,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=131626.66666666666, ans=0.125 2023-12-21 16:42:43,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2023-12-21 16:42:45,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-21 16:42:46,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=131693.33333333334, ans=0.0 2023-12-21 16:42:50,308 INFO [train.py:886] (3/4) Epoch 5, batch 700, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4802001.68 frames. ], batch size: 100, lr: 2.18e-02, grad_scale: 128.0 2023-12-21 16:42:58,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=131760.0, ans=0.0 2023-12-21 16:43:11,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=131893.33333333334, ans=0.125 2023-12-21 16:43:35,716 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.570e+01 2.719e+01 2.922e+01 3.851e+01, threshold=5.438e+01, percent-clipped=0.0 2023-12-21 16:43:41,359 INFO [train.py:886] (3/4) Epoch 5, batch 750, loss[loss=0.01602, audio_tagging_loss=0.01602, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4836417.50 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:43:47,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132093.33333333334, ans=0.125 2023-12-21 16:43:48,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=132093.33333333334, ans=0.125 2023-12-21 16:43:55,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=132160.0, ans=0.0 2023-12-21 16:43:59,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-12-21 16:44:08,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=132226.66666666666, ans=0.125 2023-12-21 16:44:20,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2023-12-21 16:44:28,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=132360.0, ans=0.125 2023-12-21 16:44:28,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=132360.0, ans=0.125 2023-12-21 16:44:31,590 INFO [train.py:886] (3/4) Epoch 5, batch 800, loss[loss=0.0182, audio_tagging_loss=0.0182, over 25000.00 frames. ], tot_loss[loss=0.01669, audio_tagging_loss=0.01669, over 4865897.10 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:44:35,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=132426.66666666666, ans=0.0 2023-12-21 16:44:50,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-12-21 16:44:51,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=132560.0, ans=0.05 2023-12-21 16:45:02,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=132626.66666666666, ans=0.0 2023-12-21 16:45:08,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=132626.66666666666, ans=0.0 2023-12-21 16:45:18,288 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.636e+01 2.805e+01 3.064e+01 4.001e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-21 16:45:23,041 INFO [train.py:886] (3/4) Epoch 5, batch 850, loss[loss=0.02034, audio_tagging_loss=0.02034, over 25000.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4884954.90 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:45:32,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=132826.66666666666, ans=0.0 2023-12-21 16:45:40,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=132826.66666666666, ans=0.09899494936611666 2023-12-21 16:46:14,106 INFO [train.py:886] (3/4) Epoch 5, batch 900, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4893612.86 frames. ], batch size: 100, lr: 2.17e-02, grad_scale: 128.0 2023-12-21 16:46:14,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-12-21 16:46:29,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.91 vs. limit=15.0 2023-12-21 16:46:35,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=133226.66666666666, ans=0.0 2023-12-21 16:46:47,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=133293.33333333334, ans=0.125 2023-12-21 16:46:50,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-12-21 16:46:54,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-12-21 16:46:59,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-12-21 16:47:00,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=133360.0, ans=0.125 2023-12-21 16:47:01,325 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.545e+01 2.742e+01 2.943e+01 3.695e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 16:47:03,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-12-21 16:47:06,140 INFO [train.py:886] (3/4) Epoch 5, batch 950, loss[loss=0.01866, audio_tagging_loss=0.01866, over 24077.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4894089.97 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:47:13,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-12-21 16:47:23,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=133493.33333333334, ans=0.125 2023-12-21 16:47:24,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=133493.33333333334, ans=0.125 2023-12-21 16:47:31,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=133560.0, ans=0.125 2023-12-21 16:47:57,562 INFO [train.py:886] (3/4) Epoch 5, batch 1000, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4907412.18 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:48:00,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133760.0, ans=0.1 2023-12-21 16:48:04,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=133760.0, ans=0.0 2023-12-21 16:48:21,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=133893.33333333334, ans=0.0 2023-12-21 16:48:32,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=133960.0, ans=0.125 2023-12-21 16:48:42,463 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.511e+01 2.666e+01 2.885e+01 3.641e+01, threshold=5.332e+01, percent-clipped=0.0 2023-12-21 16:48:42,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=134026.66666666666, ans=0.2 2023-12-21 16:48:48,805 INFO [train.py:886] (3/4) Epoch 5, batch 1050, loss[loss=0.01787, audio_tagging_loss=0.01787, over 25000.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4912516.03 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:48:49,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=134093.33333333334, ans=0.07 2023-12-21 16:48:52,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-12-21 16:49:38,674 INFO [train.py:886] (3/4) Epoch 5, batch 1100, loss[loss=0.01705, audio_tagging_loss=0.01705, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4921877.59 frames. ], batch size: 100, lr: 2.16e-02, grad_scale: 128.0 2023-12-21 16:50:06,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-21 16:50:15,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=134626.66666666666, ans=0.1 2023-12-21 16:50:19,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=134693.33333333334, ans=0.1 2023-12-21 16:50:25,433 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.502e+01 2.717e+01 2.945e+01 3.841e+01, threshold=5.435e+01, percent-clipped=0.0 2023-12-21 16:50:25,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=134693.33333333334, ans=0.035 2023-12-21 16:50:30,924 INFO [train.py:886] (3/4) Epoch 5, batch 1150, loss[loss=0.01795, audio_tagging_loss=0.01795, over 25000.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4935816.51 frames. ], batch size: 100, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:50:46,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134826.66666666666, ans=0.1 2023-12-21 16:50:49,221 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.662e+01 2023-12-21 16:50:49,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134893.33333333334, ans=0.125 2023-12-21 16:51:11,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=135026.66666666666, ans=0.125 2023-12-21 16:51:14,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135026.66666666666, ans=0.1 2023-12-21 16:51:21,038 INFO [train.py:886] (3/4) Epoch 5, batch 1200, loss[loss=0.01952, audio_tagging_loss=0.01952, over 25000.00 frames. ], tot_loss[loss=0.01681, audio_tagging_loss=0.01681, over 4943080.65 frames. ], batch size: 100, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:51:24,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=135093.33333333334, ans=0.125 2023-12-21 16:51:32,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=135160.0, ans=0.0 2023-12-21 16:51:32,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=135160.0, ans=0.2 2023-12-21 16:51:33,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=135160.0, ans=0.125 2023-12-21 16:51:34,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-21 16:51:37,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135160.0, ans=0.1 2023-12-21 16:51:47,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=135226.66666666666, ans=0.125 2023-12-21 16:51:52,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=135293.33333333334, ans=0.125 2023-12-21 16:52:07,529 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.551e+01 2.723e+01 2.888e+01 4.328e+01, threshold=5.445e+01, percent-clipped=0.0 2023-12-21 16:52:12,135 INFO [train.py:886] (3/4) Epoch 5, batch 1250, loss[loss=0.0162, audio_tagging_loss=0.0162, over 24750.00 frames. ], tot_loss[loss=0.01703, audio_tagging_loss=0.01703, over 4943674.67 frames. ], batch size: 99, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:52:23,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=135493.33333333334, ans=0.125 2023-12-21 16:52:29,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=135493.33333333334, ans=0.0 2023-12-21 16:52:30,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.36 vs. limit=22.5 2023-12-21 16:52:41,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2023-12-21 16:52:49,348 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.487e-02 2023-12-21 16:52:54,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=135693.33333333334, ans=0.0 2023-12-21 16:53:04,273 INFO [train.py:886] (3/4) Epoch 5, batch 1300, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01701, audio_tagging_loss=0.01701, over 4935916.58 frames. ], batch size: 100, lr: 2.15e-02, grad_scale: 128.0 2023-12-21 16:53:21,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=135826.66666666666, ans=0.0 2023-12-21 16:53:43,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=136026.66666666666, ans=0.125 2023-12-21 16:53:49,042 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.561e+01 2.737e+01 2.945e+01 3.593e+01, threshold=5.474e+01, percent-clipped=0.0 2023-12-21 16:53:53,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=15.0 2023-12-21 16:53:53,818 INFO [train.py:886] (3/4) Epoch 5, batch 1350, loss[loss=0.01958, audio_tagging_loss=0.01958, over 24922.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 4940362.68 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:53:57,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2023-12-21 16:53:59,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-21 16:54:03,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136160.0, ans=0.125 2023-12-21 16:54:13,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2023-12-21 16:54:16,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=136226.66666666666, ans=10.0 2023-12-21 16:54:22,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=136226.66666666666, ans=0.125 2023-12-21 16:54:35,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=136360.0, ans=0.125 2023-12-21 16:54:43,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-12-21 16:54:44,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=136360.0, ans=0.125 2023-12-21 16:54:46,085 INFO [train.py:886] (3/4) Epoch 5, batch 1400, loss[loss=0.0173, audio_tagging_loss=0.0173, over 25000.00 frames. ], tot_loss[loss=0.01682, audio_tagging_loss=0.01682, over 4935983.45 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:55:08,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-21 16:55:11,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=136560.0, ans=0.125 2023-12-21 16:55:11,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.24 vs. limit=22.5 2023-12-21 16:55:26,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=136693.33333333334, ans=0.0 2023-12-21 16:55:26,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=136693.33333333334, ans=0.0 2023-12-21 16:55:27,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=15.0 2023-12-21 16:55:31,495 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.570e+01 2.791e+01 2.991e+01 3.776e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 16:55:34,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-12-21 16:55:36,226 INFO [train.py:886] (3/4) Epoch 5, batch 1450, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01669, audio_tagging_loss=0.01669, over 4944923.54 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:55:36,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=136760.0, ans=0.0 2023-12-21 16:55:42,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.49 vs. limit=22.5 2023-12-21 16:55:48,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136826.66666666666, ans=0.1 2023-12-21 16:55:53,753 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.366e+01 2023-12-21 16:56:00,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-12-21 16:56:21,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=137026.66666666666, ans=0.125 2023-12-21 16:56:29,160 INFO [train.py:886] (3/4) Epoch 5, batch 1500, loss[loss=0.01861, audio_tagging_loss=0.01861, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4948306.36 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:56:31,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.73 vs. limit=15.0 2023-12-21 16:56:44,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=137160.0, ans=0.125 2023-12-21 16:57:12,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-12-21 16:57:12,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=137360.0, ans=0.125 2023-12-21 16:57:13,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=137360.0, ans=0.0 2023-12-21 16:57:14,688 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.623e+01 2.830e+01 3.033e+01 3.476e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-21 16:57:17,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=137360.0, ans=0.2 2023-12-21 16:57:20,728 INFO [train.py:886] (3/4) Epoch 5, batch 1550, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24750.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4947079.07 frames. ], batch size: 99, lr: 2.14e-02, grad_scale: 128.0 2023-12-21 16:57:35,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=137493.33333333334, ans=0.125 2023-12-21 16:57:43,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=137560.0, ans=0.125 2023-12-21 16:57:56,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2023-12-21 16:58:03,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=137693.33333333334, ans=0.125 2023-12-21 16:58:04,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137693.33333333334, ans=0.1 2023-12-21 16:58:07,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=137693.33333333334, ans=0.0 2023-12-21 16:58:08,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=137693.33333333334, ans=0.0 2023-12-21 16:58:10,332 INFO [train.py:886] (3/4) Epoch 5, batch 1600, loss[loss=0.01653, audio_tagging_loss=0.01653, over 24750.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 4941210.81 frames. ], batch size: 99, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:58:18,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-12-21 16:58:49,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=138026.66666666666, ans=0.0 2023-12-21 16:58:56,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.601e+01 2.757e+01 2.954e+01 3.912e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 16:59:01,651 INFO [train.py:886] (3/4) Epoch 5, batch 1650, loss[loss=0.01909, audio_tagging_loss=0.01909, over 24750.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 4941060.99 frames. ], batch size: 99, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 16:59:03,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=138093.33333333334, ans=0.04949747468305833 2023-12-21 16:59:13,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-12-21 16:59:18,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138160.0, ans=0.1 2023-12-21 16:59:49,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138360.0, ans=0.1 2023-12-21 16:59:49,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=138360.0, ans=0.0 2023-12-21 16:59:52,661 INFO [train.py:886] (3/4) Epoch 5, batch 1700, loss[loss=0.01992, audio_tagging_loss=0.01992, over 25000.00 frames. ], tot_loss[loss=0.01676, audio_tagging_loss=0.01676, over 4941202.43 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 17:00:03,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=138493.33333333334, ans=0.2 2023-12-21 17:00:08,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138493.33333333334, ans=0.0 2023-12-21 17:00:13,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138560.0, ans=0.0 2023-12-21 17:00:14,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-12-21 17:00:22,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138626.66666666666, ans=0.1 2023-12-21 17:00:26,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=138626.66666666666, ans=0.015 2023-12-21 17:00:29,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=138626.66666666666, ans=0.125 2023-12-21 17:00:35,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.55 vs. limit=10.0 2023-12-21 17:00:40,304 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.593e+01 2.813e+01 3.022e+01 3.717e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-21 17:00:45,134 INFO [train.py:886] (3/4) Epoch 5, batch 1750, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4949555.11 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 128.0 2023-12-21 17:00:45,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138760.0, ans=0.1 2023-12-21 17:00:47,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.68 vs. limit=22.5 2023-12-21 17:00:56,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=138826.66666666666, ans=0.125 2023-12-21 17:01:20,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138960.0, ans=0.1 2023-12-21 17:01:37,533 INFO [train.py:886] (3/4) Epoch 5, batch 1800, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4953411.42 frames. ], batch size: 100, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:01:42,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-12-21 17:01:52,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139160.0, ans=0.1 2023-12-21 17:01:53,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=139160.0, ans=0.05 2023-12-21 17:01:55,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-12-21 17:01:55,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=139160.0, ans=0.2 2023-12-21 17:01:57,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=139226.66666666666, ans=0.09899494936611666 2023-12-21 17:02:03,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139226.66666666666, ans=0.1 2023-12-21 17:02:23,762 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.569e+01 2.782e+01 2.969e+01 3.788e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 17:02:25,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=139360.0, ans=0.125 2023-12-21 17:02:28,439 INFO [train.py:886] (3/4) Epoch 5, batch 1850, loss[loss=0.01864, audio_tagging_loss=0.01864, over 25000.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4950364.76 frames. ], batch size: 100, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:02:30,599 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.197e+01 2023-12-21 17:02:30,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139426.66666666666, ans=0.1 2023-12-21 17:02:42,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.68 vs. limit=22.5 2023-12-21 17:02:50,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=139560.0, ans=0.015 2023-12-21 17:03:03,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-12-21 17:03:19,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=139760.0, ans=0.125 2023-12-21 17:03:19,905 INFO [train.py:886] (3/4) Epoch 5, batch 1900, loss[loss=0.0183, audio_tagging_loss=0.0183, over 24750.00 frames. ], tot_loss[loss=0.01689, audio_tagging_loss=0.01689, over 4947260.92 frames. ], batch size: 99, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:03:24,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=139760.0, ans=10.0 2023-12-21 17:03:26,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-21 17:03:39,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=139893.33333333334, ans=0.125 2023-12-21 17:03:44,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=139893.33333333334, ans=0.125 2023-12-21 17:03:49,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2023-12-21 17:03:54,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=139960.0, ans=10.0 2023-12-21 17:04:00,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=140026.66666666666, ans=0.125 2023-12-21 17:04:05,391 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.621e+01 2.837e+01 3.063e+01 3.713e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 17:04:07,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=140026.66666666666, ans=0.125 2023-12-21 17:04:07,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=140026.66666666666, ans=0.2 2023-12-21 17:04:11,623 INFO [train.py:886] (3/4) Epoch 5, batch 1950, loss[loss=0.01674, audio_tagging_loss=0.01674, over 22533.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 4947870.98 frames. ], batch size: 107, lr: 2.12e-02, grad_scale: 128.0 2023-12-21 17:04:15,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=140093.33333333334, ans=0.0 2023-12-21 17:04:22,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140160.0, ans=0.1 2023-12-21 17:04:28,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-12-21 17:04:30,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=140160.0, ans=10.0 2023-12-21 17:04:37,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=140226.66666666666, ans=0.0 2023-12-21 17:04:41,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=140293.33333333334, ans=0.2 2023-12-21 17:05:02,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=140426.66666666666, ans=0.0 2023-12-21 17:05:02,941 INFO [train.py:886] (3/4) Epoch 5, batch 2000, loss[loss=0.01713, audio_tagging_loss=0.01713, over 24750.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4952664.56 frames. ], batch size: 99, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:05:10,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 17:05:16,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140493.33333333334, ans=0.1 2023-12-21 17:05:30,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=140560.0, ans=0.0 2023-12-21 17:05:39,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2023-12-21 17:05:51,609 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.499e+01 2.652e+01 2.874e+01 3.620e+01, threshold=5.305e+01, percent-clipped=0.0 2023-12-21 17:05:55,399 INFO [train.py:886] (3/4) Epoch 5, batch 2050, loss[loss=0.01756, audio_tagging_loss=0.01756, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4955955.23 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:05:57,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=140760.0, ans=0.0 2023-12-21 17:06:13,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=140826.66666666666, ans=0.125 2023-12-21 17:06:23,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140893.33333333334, ans=0.125 2023-12-21 17:06:26,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=140960.0, ans=0.125 2023-12-21 17:06:35,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=141026.66666666666, ans=0.2 2023-12-21 17:06:36,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141026.66666666666, ans=0.1 2023-12-21 17:06:39,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=141026.66666666666, ans=0.0 2023-12-21 17:06:46,194 INFO [train.py:886] (3/4) Epoch 5, batch 2100, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4952675.83 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:07:02,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-12-21 17:07:07,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141226.66666666666, ans=0.1 2023-12-21 17:07:14,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2023-12-21 17:07:17,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-12-21 17:07:19,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141293.33333333334, ans=0.1 2023-12-21 17:07:22,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=141293.33333333334, ans=0.2 2023-12-21 17:07:31,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=141360.0, ans=0.2 2023-12-21 17:07:31,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=141360.0, ans=0.125 2023-12-21 17:07:34,673 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.574e+01 2.738e+01 2.896e+01 3.657e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-21 17:07:38,495 INFO [train.py:886] (3/4) Epoch 5, batch 2150, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4957110.94 frames. ], batch size: 99, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:07:40,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=141426.66666666666, ans=0.0 2023-12-21 17:07:41,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=141426.66666666666, ans=0.0 2023-12-21 17:07:51,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=12.0 2023-12-21 17:08:02,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=141560.0, ans=0.0 2023-12-21 17:08:03,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=141560.0, ans=0.2 2023-12-21 17:08:11,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.06 vs. limit=22.5 2023-12-21 17:08:14,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-12-21 17:08:19,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=141693.33333333334, ans=0.125 2023-12-21 17:08:31,222 INFO [train.py:886] (3/4) Epoch 5, batch 2200, loss[loss=0.01704, audio_tagging_loss=0.01704, over 24050.00 frames. ], tot_loss[loss=0.01667, audio_tagging_loss=0.01667, over 4950001.75 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 64.0 2023-12-21 17:08:34,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-21 17:08:40,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.26 vs. limit=22.5 2023-12-21 17:08:56,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=141893.33333333334, ans=0.125 2023-12-21 17:08:58,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=141893.33333333334, ans=0.0 2023-12-21 17:09:09,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141960.0, ans=0.1 2023-12-21 17:09:11,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.81 vs. limit=22.5 2023-12-21 17:09:17,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2023-12-21 17:09:17,619 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.618e+01 2.828e+01 3.065e+01 3.912e+01, threshold=5.656e+01, percent-clipped=0.0 2023-12-21 17:09:21,470 INFO [train.py:886] (3/4) Epoch 5, batch 2250, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4944193.99 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:09:31,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2023-12-21 17:09:35,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.64 vs. limit=22.5 2023-12-21 17:09:37,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=142160.0, ans=0.125 2023-12-21 17:09:37,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=142160.0, ans=0.125 2023-12-21 17:09:37,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142160.0, ans=0.1 2023-12-21 17:09:38,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=142160.0, ans=10.0 2023-12-21 17:09:49,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=142226.66666666666, ans=0.0 2023-12-21 17:10:08,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=142360.0, ans=10.0 2023-12-21 17:10:14,388 INFO [train.py:886] (3/4) Epoch 5, batch 2300, loss[loss=0.01837, audio_tagging_loss=0.01837, over 22063.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 4940758.82 frames. ], batch size: 107, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:10:19,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.09 vs. limit=15.0 2023-12-21 17:10:20,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-12-21 17:10:42,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=142560.0, ans=0.5 2023-12-21 17:10:42,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-12-21 17:10:43,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2023-12-21 17:10:43,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=142560.0, ans=0.125 2023-12-21 17:10:52,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=142626.66666666666, ans=0.2 2023-12-21 17:11:00,692 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.568e+01 2.751e+01 2.902e+01 4.993e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 17:11:05,214 INFO [train.py:886] (3/4) Epoch 5, batch 2350, loss[loss=0.0162, audio_tagging_loss=0.0162, over 24750.00 frames. ], tot_loss[loss=0.01664, audio_tagging_loss=0.01664, over 4946101.05 frames. ], batch size: 99, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:11:25,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=142893.33333333334, ans=0.125 2023-12-21 17:11:30,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=142893.33333333334, ans=0.0 2023-12-21 17:11:36,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-12-21 17:11:37,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=142960.0, ans=0.0 2023-12-21 17:11:52,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=143026.66666666666, ans=0.125 2023-12-21 17:11:56,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=143093.33333333334, ans=0.125 2023-12-21 17:11:57,052 INFO [train.py:886] (3/4) Epoch 5, batch 2400, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.01652, audio_tagging_loss=0.01652, over 4949415.43 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:11,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=143160.0, ans=0.09899494936611666 2023-12-21 17:12:23,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=143226.66666666666, ans=0.125 2023-12-21 17:12:44,893 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.521e+01 2.685e+01 2.894e+01 4.113e+01, threshold=5.370e+01, percent-clipped=0.0 2023-12-21 17:12:49,446 INFO [train.py:886] (3/4) Epoch 5, batch 2450, loss[loss=0.01815, audio_tagging_loss=0.01815, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4955121.76 frames. ], batch size: 100, lr: 2.10e-02, grad_scale: 64.0 2023-12-21 17:12:50,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=143426.66666666666, ans=0.125 2023-12-21 17:13:09,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=143560.0, ans=0.125 2023-12-21 17:13:29,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=143693.33333333334, ans=0.125 2023-12-21 17:13:39,904 INFO [train.py:886] (3/4) Epoch 5, batch 2500, loss[loss=0.01849, audio_tagging_loss=0.01849, over 24750.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4953485.57 frames. ], batch size: 99, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:13:41,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143760.0, ans=0.1 2023-12-21 17:13:43,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=143760.0, ans=0.125 2023-12-21 17:13:48,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-12-21 17:14:01,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.99 vs. limit=22.5 2023-12-21 17:14:02,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=143893.33333333334, ans=0.0 2023-12-21 17:14:06,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=143893.33333333334, ans=0.125 2023-12-21 17:14:27,830 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.613e+01 2.844e+01 3.079e+01 3.579e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-21 17:14:29,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-12-21 17:14:31,607 INFO [train.py:886] (3/4) Epoch 5, batch 2550, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24122.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 4949502.85 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:14:35,617 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.204e+00 2023-12-21 17:14:38,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144093.33333333334, ans=0.1 2023-12-21 17:14:40,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=15.0 2023-12-21 17:14:44,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=144160.0, ans=0.0 2023-12-21 17:15:00,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-12-21 17:15:11,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144360.0, ans=0.125 2023-12-21 17:15:18,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=144360.0, ans=0.0 2023-12-21 17:15:22,878 INFO [train.py:886] (3/4) Epoch 5, batch 2600, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 4951266.51 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:15:24,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=144426.66666666666, ans=0.125 2023-12-21 17:15:37,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=144493.33333333334, ans=0.125 2023-12-21 17:15:56,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=144626.66666666666, ans=0.2 2023-12-21 17:16:10,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.544e+01 2.733e+01 3.057e+01 4.063e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 17:16:14,197 INFO [train.py:886] (3/4) Epoch 5, batch 2650, loss[loss=0.01792, audio_tagging_loss=0.01792, over 25000.00 frames. ], tot_loss[loss=0.01671, audio_tagging_loss=0.01671, over 4954542.89 frames. ], batch size: 100, lr: 2.09e-02, grad_scale: 64.0 2023-12-21 17:16:22,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144760.0, ans=0.1 2023-12-21 17:16:35,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=144893.33333333334, ans=0.0 2023-12-21 17:16:41,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=144893.33333333334, ans=0.2 2023-12-21 17:16:50,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2023-12-21 17:16:51,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=30.57 vs. limit=22.5 2023-12-21 17:17:07,036 INFO [train.py:886] (3/4) Epoch 5, batch 2700, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4956547.00 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:17:43,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=15.0 2023-12-21 17:17:47,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=145360.0, ans=0.125 2023-12-21 17:17:52,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2023-12-21 17:17:53,872 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.469e+01 2.654e+01 2.877e+01 3.564e+01, threshold=5.308e+01, percent-clipped=0.0 2023-12-21 17:17:57,715 INFO [train.py:886] (3/4) Epoch 5, batch 2750, loss[loss=0.01803, audio_tagging_loss=0.01803, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4957033.40 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:18:02,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145426.66666666666, ans=0.1 2023-12-21 17:18:07,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=145426.66666666666, ans=0.0 2023-12-21 17:18:10,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145493.33333333334, ans=0.1 2023-12-21 17:18:14,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=145493.33333333334, ans=15.0 2023-12-21 17:18:17,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145493.33333333334, ans=0.1 2023-12-21 17:18:23,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=145560.0, ans=0.125 2023-12-21 17:18:50,490 INFO [train.py:886] (3/4) Epoch 5, batch 2800, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24033.00 frames. ], tot_loss[loss=0.01664, audio_tagging_loss=0.01664, over 4954911.26 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:18:51,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-21 17:18:58,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=145760.0, ans=0.125 2023-12-21 17:19:05,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2023-12-21 17:19:10,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=145893.33333333334, ans=0.125 2023-12-21 17:19:38,106 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.607e+01 2.811e+01 3.107e+01 4.329e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-21 17:19:42,645 INFO [train.py:886] (3/4) Epoch 5, batch 2850, loss[loss=0.0176, audio_tagging_loss=0.0176, over 24750.00 frames. ], tot_loss[loss=0.01672, audio_tagging_loss=0.01672, over 4951342.85 frames. ], batch size: 99, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:19:46,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=146093.33333333334, ans=0.1 2023-12-21 17:19:51,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=146093.33333333334, ans=0.0 2023-12-21 17:19:55,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=146160.0, ans=0.1 2023-12-21 17:20:16,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=146293.33333333334, ans=0.0 2023-12-21 17:20:17,772 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:20:19,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=146293.33333333334, ans=0.0 2023-12-21 17:20:33,322 INFO [train.py:886] (3/4) Epoch 5, batch 2900, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4944043.42 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 64.0 2023-12-21 17:20:35,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-21 17:20:36,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=146426.66666666666, ans=0.2 2023-12-21 17:20:54,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-12-21 17:20:59,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=146560.0, ans=10.0 2023-12-21 17:21:08,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.16 vs. limit=22.5 2023-12-21 17:21:21,920 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.595e+01 2.753e+01 2.968e+01 3.866e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 17:21:25,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=146760.0, ans=0.0 2023-12-21 17:21:25,890 INFO [train.py:886] (3/4) Epoch 5, batch 2950, loss[loss=0.01594, audio_tagging_loss=0.01594, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4947090.33 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:21:30,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=146760.0, ans=0.125 2023-12-21 17:21:46,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146893.33333333334, ans=0.1 2023-12-21 17:21:53,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-21 17:21:54,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-12-21 17:21:57,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=146960.0, ans=0.0 2023-12-21 17:22:03,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146960.0, ans=0.1 2023-12-21 17:22:17,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=12.0 2023-12-21 17:22:18,288 INFO [train.py:886] (3/4) Epoch 5, batch 3000, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01648, audio_tagging_loss=0.01648, over 4952096.37 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:22:18,289 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 17:22:39,424 INFO [train.py:917] (3/4) Epoch 5, validation: loss=0.04009, audio_tagging_loss=0.04009, over 3737520.00 frames. 2023-12-21 17:22:39,425 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 17:23:10,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=147293.33333333334, ans=0.09899494936611666 2023-12-21 17:23:26,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=147360.0, ans=0.0 2023-12-21 17:23:27,623 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.543e+01 2.712e+01 2.939e+01 3.335e+01, threshold=5.424e+01, percent-clipped=0.0 2023-12-21 17:23:31,445 INFO [train.py:886] (3/4) Epoch 5, batch 3050, loss[loss=0.01865, audio_tagging_loss=0.01865, over 21332.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4953178.24 frames. ], batch size: 107, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:23:33,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=147426.66666666666, ans=0.07 2023-12-21 17:23:34,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=147426.66666666666, ans=0.0 2023-12-21 17:23:51,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=147560.0, ans=0.125 2023-12-21 17:23:58,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=147560.0, ans=0.0 2023-12-21 17:24:14,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=147693.33333333334, ans=0.05 2023-12-21 17:24:23,425 INFO [train.py:886] (3/4) Epoch 5, batch 3100, loss[loss=0.01872, audio_tagging_loss=0.01872, over 25000.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 4952172.88 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:24:26,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147760.0, ans=0.125 2023-12-21 17:24:27,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-21 17:24:34,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147826.66666666666, ans=0.125 2023-12-21 17:24:37,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=147826.66666666666, ans=0.0 2023-12-21 17:24:47,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=147893.33333333334, ans=0.2 2023-12-21 17:24:47,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=147893.33333333334, ans=0.0 2023-12-21 17:25:09,743 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.613e+01 2.780e+01 2.978e+01 3.396e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 17:25:13,593 INFO [train.py:886] (3/4) Epoch 5, batch 3150, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24750.00 frames. ], tot_loss[loss=0.01678, audio_tagging_loss=0.01678, over 4947534.42 frames. ], batch size: 99, lr: 2.07e-02, grad_scale: 64.0 2023-12-21 17:25:26,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=148160.0, ans=0.125 2023-12-21 17:25:28,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=148160.0, ans=0.125 2023-12-21 17:25:31,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-12-21 17:25:51,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=148293.33333333334, ans=0.125 2023-12-21 17:25:56,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-12-21 17:25:57,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=148360.0, ans=0.125 2023-12-21 17:26:02,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=148360.0, ans=0.09899494936611666 2023-12-21 17:26:05,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148426.66666666666, ans=0.125 2023-12-21 17:26:05,978 INFO [train.py:886] (3/4) Epoch 5, batch 3200, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4943352.84 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:26:52,670 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.499e+01 2.673e+01 2.895e+01 4.175e+01, threshold=5.346e+01, percent-clipped=0.0 2023-12-21 17:26:53,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-12-21 17:26:56,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=148760.0, ans=0.04949747468305833 2023-12-21 17:26:57,279 INFO [train.py:886] (3/4) Epoch 5, batch 3250, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01656, audio_tagging_loss=0.01656, over 4945703.02 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:27:01,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148760.0, ans=0.1 2023-12-21 17:27:08,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=148826.66666666666, ans=0.125 2023-12-21 17:27:10,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=148826.66666666666, ans=0.1 2023-12-21 17:27:16,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=148826.66666666666, ans=0.2 2023-12-21 17:27:19,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=148893.33333333334, ans=0.0 2023-12-21 17:27:24,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=148893.33333333334, ans=0.125 2023-12-21 17:27:48,883 INFO [train.py:886] (3/4) Epoch 5, batch 3300, loss[loss=0.01768, audio_tagging_loss=0.01768, over 25000.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 4947665.19 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:28:18,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149226.66666666666, ans=0.0 2023-12-21 17:28:23,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-12-21 17:28:35,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.37 vs. limit=22.5 2023-12-21 17:28:36,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.590e+01 2.838e+01 3.066e+01 3.975e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-21 17:28:39,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=149360.0, ans=0.95 2023-12-21 17:28:39,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.87 vs. limit=22.5 2023-12-21 17:28:41,795 INFO [train.py:886] (3/4) Epoch 5, batch 3350, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4956935.66 frames. ], batch size: 100, lr: 2.06e-02, grad_scale: 64.0 2023-12-21 17:28:43,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=149426.66666666666, ans=0.0 2023-12-21 17:28:59,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=149493.33333333334, ans=0.0 2023-12-21 17:29:01,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.89 vs. limit=22.5 2023-12-21 17:29:03,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=149560.0, ans=0.0 2023-12-21 17:29:10,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.52 vs. limit=22.5 2023-12-21 17:29:15,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=149626.66666666666, ans=0.0 2023-12-21 17:29:19,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.45 vs. limit=22.5 2023-12-21 17:29:23,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=149693.33333333334, ans=0.0 2023-12-21 17:29:24,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=149693.33333333334, ans=0.125 2023-12-21 17:29:31,395 INFO [train.py:886] (3/4) Epoch 5, batch 3400, loss[loss=0.02116, audio_tagging_loss=0.02116, over 24750.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4956248.68 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:29:32,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=149760.0, ans=0.125 2023-12-21 17:29:35,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149760.0, ans=0.1 2023-12-21 17:29:35,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-12-21 17:29:41,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-12-21 17:29:46,243 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.545e+00 2023-12-21 17:29:46,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.92 vs. limit=22.5 2023-12-21 17:29:47,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=149826.66666666666, ans=0.0 2023-12-21 17:30:16,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=150026.66666666666, ans=0.04949747468305833 2023-12-21 17:30:18,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150026.66666666666, ans=0.1 2023-12-21 17:30:19,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=150026.66666666666, ans=0.07 2023-12-21 17:30:20,654 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.600e+01 2.817e+01 3.077e+01 4.267e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-21 17:30:23,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150093.33333333334, ans=0.1 2023-12-21 17:30:24,438 INFO [train.py:886] (3/4) Epoch 5, batch 3450, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4958214.14 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:30:31,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=150093.33333333334, ans=0.125 2023-12-21 17:30:59,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=150293.33333333334, ans=0.125 2023-12-21 17:31:14,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=150360.0, ans=0.0 2023-12-21 17:31:15,705 INFO [train.py:886] (3/4) Epoch 5, batch 3500, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4951153.78 frames. ], batch size: 99, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:31:41,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=150560.0, ans=0.125 2023-12-21 17:31:52,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=150626.66666666666, ans=0.0 2023-12-21 17:32:03,513 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.549e+01 2.777e+01 3.022e+01 4.198e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-21 17:32:07,368 INFO [train.py:886] (3/4) Epoch 5, batch 3550, loss[loss=0.01757, audio_tagging_loss=0.01757, over 25000.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4950130.51 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:32:11,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=150760.0, ans=0.2 2023-12-21 17:32:31,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=15.0 2023-12-21 17:32:31,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=150893.33333333334, ans=0.2 2023-12-21 17:32:31,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150893.33333333334, ans=0.1 2023-12-21 17:32:33,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=150893.33333333334, ans=0.2 2023-12-21 17:32:43,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=150960.0, ans=0.125 2023-12-21 17:32:44,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2023-12-21 17:32:49,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=151026.66666666666, ans=0.0 2023-12-21 17:32:59,039 INFO [train.py:886] (3/4) Epoch 5, batch 3600, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4947873.69 frames. ], batch size: 100, lr: 2.05e-02, grad_scale: 64.0 2023-12-21 17:33:12,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=151160.0, ans=0.125 2023-12-21 17:33:14,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=151160.0, ans=0.0 2023-12-21 17:33:18,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=151226.66666666666, ans=0.125 2023-12-21 17:33:23,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151226.66666666666, ans=0.0 2023-12-21 17:33:46,184 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.518e+01 2.646e+01 2.841e+01 3.411e+01, threshold=5.291e+01, percent-clipped=0.0 2023-12-21 17:33:49,953 INFO [train.py:886] (3/4) Epoch 5, batch 3650, loss[loss=0.01801, audio_tagging_loss=0.01801, over 25000.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4947334.48 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:34:02,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=151493.33333333334, ans=0.2 2023-12-21 17:34:22,361 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.553e+00 2023-12-21 17:34:34,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=151693.33333333334, ans=0.125 2023-12-21 17:34:39,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=151693.33333333334, ans=0.125 2023-12-21 17:34:42,997 INFO [train.py:886] (3/4) Epoch 5, batch 3700, loss[loss=0.01559, audio_tagging_loss=0.01559, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4949127.00 frames. ], batch size: 100, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:34:50,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.22 vs. limit=22.5 2023-12-21 17:34:51,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=151826.66666666666, ans=0.125 2023-12-21 17:35:11,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=151893.33333333334, ans=0.2 2023-12-21 17:35:19,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=151960.0, ans=0.0 2023-12-21 17:35:26,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=152026.66666666666, ans=0.0 2023-12-21 17:35:29,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=152026.66666666666, ans=0.125 2023-12-21 17:35:30,108 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.577e+01 2.848e+01 3.103e+01 4.088e+01, threshold=5.696e+01, percent-clipped=0.0 2023-12-21 17:35:34,632 INFO [train.py:886] (3/4) Epoch 5, batch 3750, loss[loss=0.01777, audio_tagging_loss=0.01777, over 24750.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4948998.29 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:35:49,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=152160.0, ans=0.125 2023-12-21 17:35:55,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152226.66666666666, ans=0.1 2023-12-21 17:36:09,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-12-21 17:36:26,440 INFO [train.py:886] (3/4) Epoch 5, batch 3800, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24750.00 frames. ], tot_loss[loss=0.01668, audio_tagging_loss=0.01668, over 4948019.54 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:36:26,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=152426.66666666666, ans=0.125 2023-12-21 17:36:29,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=152426.66666666666, ans=0.0 2023-12-21 17:36:30,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=152426.66666666666, ans=0.0 2023-12-21 17:36:34,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=152426.66666666666, ans=0.0 2023-12-21 17:36:57,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=152626.66666666666, ans=0.125 2023-12-21 17:36:58,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.88 vs. limit=22.5 2023-12-21 17:37:08,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=152693.33333333334, ans=0.2 2023-12-21 17:37:13,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-12-21 17:37:14,679 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.532e+01 2.751e+01 2.984e+01 4.150e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 17:37:15,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152693.33333333334, ans=0.1 2023-12-21 17:37:18,460 INFO [train.py:886] (3/4) Epoch 5, batch 3850, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4949909.27 frames. ], batch size: 99, lr: 2.04e-02, grad_scale: 64.0 2023-12-21 17:37:29,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.72 vs. limit=10.0 2023-12-21 17:37:43,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.39 vs. limit=10.0 2023-12-21 17:37:59,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=153026.66666666666, ans=0.125 2023-12-21 17:38:11,318 INFO [train.py:886] (3/4) Epoch 5, batch 3900, loss[loss=0.01575, audio_tagging_loss=0.01575, over 25000.00 frames. ], tot_loss[loss=0.01654, audio_tagging_loss=0.01654, over 4949471.96 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:38:13,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=153093.33333333334, ans=0.0 2023-12-21 17:38:34,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=153226.66666666666, ans=0.125 2023-12-21 17:38:35,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=153226.66666666666, ans=0.0 2023-12-21 17:38:37,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-12-21 17:38:45,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-12-21 17:38:53,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153360.0, ans=0.125 2023-12-21 17:38:57,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=153360.0, ans=0.2 2023-12-21 17:38:58,377 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.553e+01 2.751e+01 2.942e+01 3.918e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 17:39:02,253 INFO [train.py:886] (3/4) Epoch 5, batch 3950, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4952536.51 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 64.0 2023-12-21 17:39:05,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153426.66666666666, ans=0.125 2023-12-21 17:39:06,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.89 vs. limit=15.0 2023-12-21 17:39:16,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2023-12-21 17:39:27,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2023-12-21 17:39:28,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=153560.0, ans=0.125 2023-12-21 17:39:36,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=153626.66666666666, ans=0.125 2023-12-21 17:39:55,694 INFO [train.py:886] (3/4) Epoch 5, batch 4000, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4954971.04 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:39:59,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-12-21 17:40:02,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=153760.0, ans=0.125 2023-12-21 17:40:15,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153893.33333333334, ans=0.1 2023-12-21 17:40:30,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=153960.0, ans=0.2 2023-12-21 17:40:36,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=154026.66666666666, ans=0.05 2023-12-21 17:40:37,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-12-21 17:40:42,022 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.602e+01 2.728e+01 2.900e+01 3.775e+01, threshold=5.457e+01, percent-clipped=0.0 2023-12-21 17:40:46,611 INFO [train.py:886] (3/4) Epoch 5, batch 4050, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4954311.74 frames. ], batch size: 100, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:40:56,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154160.0, ans=0.0 2023-12-21 17:41:12,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=154226.66666666666, ans=0.0 2023-12-21 17:41:20,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=12.0 2023-12-21 17:41:31,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154360.0, ans=0.125 2023-12-21 17:41:36,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=154360.0, ans=0.0 2023-12-21 17:41:38,556 INFO [train.py:886] (3/4) Epoch 5, batch 4100, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01662, audio_tagging_loss=0.01662, over 4951167.16 frames. ], batch size: 99, lr: 2.03e-02, grad_scale: 128.0 2023-12-21 17:41:59,708 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 17:42:08,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=154560.0, ans=0.2 2023-12-21 17:42:10,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.75 vs. limit=15.0 2023-12-21 17:42:23,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.90 vs. limit=22.5 2023-12-21 17:42:26,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.535e+01 2.761e+01 2.990e+01 3.473e+01, threshold=5.523e+01, percent-clipped=0.0 2023-12-21 17:42:30,733 INFO [train.py:886] (3/4) Epoch 5, batch 4150, loss[loss=0.01837, audio_tagging_loss=0.01837, over 24750.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4949100.34 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:43:02,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=15.0 2023-12-21 17:43:05,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=154960.0, ans=0.1 2023-12-21 17:43:05,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=154960.0, ans=0.2 2023-12-21 17:43:08,551 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.510e+00 2023-12-21 17:43:09,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=154960.0, ans=0.0 2023-12-21 17:43:21,614 INFO [train.py:886] (3/4) Epoch 5, batch 4200, loss[loss=0.01665, audio_tagging_loss=0.01665, over 24750.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4942036.79 frames. ], batch size: 99, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:43:24,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=155093.33333333334, ans=0.125 2023-12-21 17:43:25,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=155093.33333333334, ans=0.125 2023-12-21 17:43:43,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2023-12-21 17:43:47,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=155226.66666666666, ans=0.0 2023-12-21 17:44:09,863 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.602e+01 2.757e+01 3.020e+01 3.961e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:44:13,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=155426.66666666666, ans=22.5 2023-12-21 17:44:13,607 INFO [train.py:886] (3/4) Epoch 5, batch 4250, loss[loss=0.01646, audio_tagging_loss=0.01646, over 23988.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4951127.13 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:44:52,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=155693.33333333334, ans=0.0 2023-12-21 17:44:55,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.98 vs. limit=22.5 2023-12-21 17:45:03,643 INFO [train.py:886] (3/4) Epoch 5, batch 4300, loss[loss=0.01724, audio_tagging_loss=0.01724, over 22239.00 frames. ], tot_loss[loss=0.01636, audio_tagging_loss=0.01636, over 4955153.56 frames. ], batch size: 107, lr: 2.02e-02, grad_scale: 128.0 2023-12-21 17:45:10,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=155760.0, ans=0.0 2023-12-21 17:45:20,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155826.66666666666, ans=0.1 2023-12-21 17:45:21,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=155826.66666666666, ans=0.125 2023-12-21 17:45:21,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=15.0 2023-12-21 17:45:34,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2023-12-21 17:45:34,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=155960.0, ans=0.07 2023-12-21 17:45:53,520 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.657e+01 2.804e+01 3.021e+01 3.869e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-21 17:45:55,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=156093.33333333334, ans=0.0 2023-12-21 17:45:56,393 INFO [train.py:886] (3/4) Epoch 5, batch 4350, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4960119.20 frames. ], batch size: 100, lr: 2.02e-02, grad_scale: 64.0 2023-12-21 17:46:10,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=156160.0, ans=0.2 2023-12-21 17:46:14,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=156160.0, ans=0.125 2023-12-21 17:46:21,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=156226.66666666666, ans=0.125 2023-12-21 17:46:25,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=12.0 2023-12-21 17:46:26,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.24 vs. limit=22.5 2023-12-21 17:46:48,704 INFO [train.py:886] (3/4) Epoch 5, batch 4400, loss[loss=0.01561, audio_tagging_loss=0.01561, over 24750.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 4954567.91 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:46:50,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=156426.66666666666, ans=0.125 2023-12-21 17:46:56,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=156426.66666666666, ans=0.1 2023-12-21 17:46:59,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=156493.33333333334, ans=0.05 2023-12-21 17:47:05,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.13 vs. limit=22.5 2023-12-21 17:47:13,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-12-21 17:47:18,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=156626.66666666666, ans=0.2 2023-12-21 17:47:18,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=156626.66666666666, ans=0.09899494936611666 2023-12-21 17:47:28,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-21 17:47:34,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=156693.33333333334, ans=0.125 2023-12-21 17:47:35,931 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.647e+01 2.808e+01 3.114e+01 3.579e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 17:47:38,837 INFO [train.py:886] (3/4) Epoch 5, batch 4450, loss[loss=0.01732, audio_tagging_loss=0.01732, over 24750.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4954804.98 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:47:48,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=156760.0, ans=15.0 2023-12-21 17:47:48,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=156826.66666666666, ans=0.125 2023-12-21 17:48:03,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156893.33333333334, ans=0.125 2023-12-21 17:48:23,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.96 vs. limit=22.5 2023-12-21 17:48:31,774 INFO [train.py:886] (3/4) Epoch 5, batch 4500, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01666, audio_tagging_loss=0.01666, over 4952012.33 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:48:34,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=157093.33333333334, ans=0.125 2023-12-21 17:48:40,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=157160.0, ans=0.07 2023-12-21 17:48:53,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=157226.66666666666, ans=0.05 2023-12-21 17:49:00,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=157226.66666666666, ans=0.125 2023-12-21 17:49:14,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=157360.0, ans=0.0 2023-12-21 17:49:15,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=157360.0, ans=0.0 2023-12-21 17:49:19,754 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.535e+01 2.688e+01 2.916e+01 3.475e+01, threshold=5.375e+01, percent-clipped=0.0 2023-12-21 17:49:20,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2023-12-21 17:49:23,252 INFO [train.py:886] (3/4) Epoch 5, batch 4550, loss[loss=0.0144, audio_tagging_loss=0.0144, over 24022.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4955863.64 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:49:26,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=157426.66666666666, ans=0.0 2023-12-21 17:49:29,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=157426.66666666666, ans=0.125 2023-12-21 17:49:39,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2023-12-21 17:49:46,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=157560.0, ans=0.2 2023-12-21 17:49:53,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=157626.66666666666, ans=0.0 2023-12-21 17:50:15,394 INFO [train.py:886] (3/4) Epoch 5, batch 4600, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4952322.64 frames. ], batch size: 99, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:50:26,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=157826.66666666666, ans=0.0 2023-12-21 17:50:29,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=157826.66666666666, ans=0.125 2023-12-21 17:50:32,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-12-21 17:50:35,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=157826.66666666666, ans=0.125 2023-12-21 17:50:37,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-12-21 17:50:50,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-12-21 17:50:52,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-12-21 17:51:04,726 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.622e+01 2.861e+01 3.041e+01 4.016e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 17:51:08,380 INFO [train.py:886] (3/4) Epoch 5, batch 4650, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01663, audio_tagging_loss=0.01663, over 4948746.14 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 64.0 2023-12-21 17:51:18,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=158160.0, ans=0.125 2023-12-21 17:51:37,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=158226.66666666666, ans=0.125 2023-12-21 17:51:41,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=158293.33333333334, ans=0.125 2023-12-21 17:51:43,790 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.323e-03 2023-12-21 17:51:58,921 INFO [train.py:886] (3/4) Epoch 5, batch 4700, loss[loss=0.01766, audio_tagging_loss=0.01766, over 24750.00 frames. ], tot_loss[loss=0.01669, audio_tagging_loss=0.01669, over 4950457.26 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:00,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=158426.66666666666, ans=0.2 2023-12-21 17:52:10,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=158493.33333333334, ans=0.125 2023-12-21 17:52:10,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=158493.33333333334, ans=0.5 2023-12-21 17:52:16,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=158560.0, ans=0.125 2023-12-21 17:52:21,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=158560.0, ans=0.125 2023-12-21 17:52:26,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=158626.66666666666, ans=0.2 2023-12-21 17:52:31,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=158626.66666666666, ans=0.2 2023-12-21 17:52:32,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=15.0 2023-12-21 17:52:43,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.43 vs. limit=15.0 2023-12-21 17:52:43,575 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.632e+01 2.850e+01 3.066e+01 3.768e+01, threshold=5.700e+01, percent-clipped=0.0 2023-12-21 17:52:46,341 INFO [train.py:886] (3/4) Epoch 5, batch 4750, loss[loss=0.01823, audio_tagging_loss=0.01823, over 24750.00 frames. ], tot_loss[loss=0.01675, audio_tagging_loss=0.01675, over 4941977.92 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 64.0 2023-12-21 17:52:51,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=158760.0, ans=0.0 2023-12-21 17:52:59,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=158826.66666666666, ans=0.2 2023-12-21 17:52:59,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=158826.66666666666, ans=0.125 2023-12-21 17:53:24,101 INFO [train.py:886] (3/4) Epoch 6, batch 0, loss[loss=0.03632, audio_tagging_loss=0.03632, over 24051.00 frames. ], tot_loss[loss=0.03632, audio_tagging_loss=0.03632, over 24051.00 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:53:24,102 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 17:53:45,405 INFO [train.py:917] (3/4) Epoch 6, validation: loss=0.03649, audio_tagging_loss=0.03649, over 3737520.00 frames. 2023-12-21 17:53:45,406 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 17:53:48,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.84 vs. limit=22.5 2023-12-21 17:53:53,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.51 vs. limit=22.5 2023-12-21 17:53:56,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158933.33333333334, ans=0.125 2023-12-21 17:54:09,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159000.0, ans=0.95 2023-12-21 17:54:26,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=159133.33333333334, ans=0.0 2023-12-21 17:54:28,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=159133.33333333334, ans=0.0 2023-12-21 17:54:36,664 INFO [train.py:886] (3/4) Epoch 6, batch 50, loss[loss=0.02284, audio_tagging_loss=0.02284, over 25000.00 frames. ], tot_loss[loss=0.02625, audio_tagging_loss=0.02625, over 1119369.09 frames. ], batch size: 100, lr: 1.87e-02, grad_scale: 64.0 2023-12-21 17:54:47,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=159266.66666666666, ans=0.0 2023-12-21 17:54:48,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.77 vs. limit=22.5 2023-12-21 17:55:00,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=159333.33333333334, ans=0.125 2023-12-21 17:55:01,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.58 vs. limit=15.0 2023-12-21 17:55:07,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-21 17:55:08,011 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.015e+01 3.367e+01 3.698e+01 8.619e+01, threshold=6.734e+01, percent-clipped=4.0 2023-12-21 17:55:08,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=159400.0, ans=0.1 2023-12-21 17:55:13,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-21 17:55:28,823 INFO [train.py:886] (3/4) Epoch 6, batch 100, loss[loss=0.0171, audio_tagging_loss=0.0171, over 24750.00 frames. ], tot_loss[loss=0.02257, audio_tagging_loss=0.02257, over 1968603.96 frames. ], batch size: 99, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:55:30,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=159533.33333333334, ans=0.2 2023-12-21 17:55:45,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159600.0, ans=0.1 2023-12-21 17:55:51,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.98 vs. limit=22.5 2023-12-21 17:56:06,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=159733.33333333334, ans=0.0 2023-12-21 17:56:19,827 INFO [train.py:886] (3/4) Epoch 6, batch 150, loss[loss=0.01566, audio_tagging_loss=0.01566, over 24029.00 frames. ], tot_loss[loss=0.02032, audio_tagging_loss=0.02032, over 2631456.39 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:56:37,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-12-21 17:56:54,051 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.731e+01 2.909e+01 3.115e+01 3.553e+01, threshold=5.819e+01, percent-clipped=0.0 2023-12-21 17:57:10,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2023-12-21 17:57:13,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=160200.0, ans=0.0 2023-12-21 17:57:14,145 INFO [train.py:886] (3/4) Epoch 6, batch 200, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24053.00 frames. ], tot_loss[loss=0.01909, audio_tagging_loss=0.01909, over 3148581.52 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:57:16,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-12-21 17:57:21,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2023-12-21 17:57:29,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=8.0 2023-12-21 17:57:38,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-12-21 17:57:41,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=160333.33333333334, ans=0.0 2023-12-21 17:57:47,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160400.0, ans=0.0 2023-12-21 17:58:01,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=160466.66666666666, ans=0.0 2023-12-21 17:58:04,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=160466.66666666666, ans=0.1 2023-12-21 17:58:05,837 INFO [train.py:886] (3/4) Epoch 6, batch 250, loss[loss=0.01838, audio_tagging_loss=0.01838, over 25000.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 3554047.98 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:58:11,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160533.33333333334, ans=0.1 2023-12-21 17:58:22,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.67 vs. limit=10.0 2023-12-21 17:58:36,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-12-21 17:58:36,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=160733.33333333334, ans=0.0 2023-12-21 17:58:37,627 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.574e+01 2.757e+01 2.978e+01 3.329e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 17:58:38,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=160733.33333333334, ans=0.07 2023-12-21 17:58:41,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=160733.33333333334, ans=0.125 2023-12-21 17:58:46,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160800.0, ans=0.1 2023-12-21 17:58:55,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=160800.0, ans=0.125 2023-12-21 17:58:57,039 INFO [train.py:886] (3/4) Epoch 6, batch 300, loss[loss=0.01708, audio_tagging_loss=0.01708, over 24750.00 frames. ], tot_loss[loss=0.01787, audio_tagging_loss=0.01787, over 3860225.45 frames. ], batch size: 99, lr: 1.86e-02, grad_scale: 64.0 2023-12-21 17:59:01,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160866.66666666666, ans=0.0 2023-12-21 17:59:03,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=160866.66666666666, ans=0.0 2023-12-21 17:59:07,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-21 17:59:10,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=160933.33333333334, ans=0.1 2023-12-21 17:59:18,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.90 vs. limit=15.0 2023-12-21 17:59:34,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=161066.66666666666, ans=0.2 2023-12-21 17:59:38,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=161133.33333333334, ans=0.125 2023-12-21 17:59:38,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-12-21 17:59:49,339 INFO [train.py:886] (3/4) Epoch 6, batch 350, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01766, audio_tagging_loss=0.01766, over 4100564.15 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:00:03,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161266.66666666666, ans=0.1 2023-12-21 18:00:09,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=161333.33333333334, ans=0.125 2023-12-21 18:00:18,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-21 18:00:21,394 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.561e+01 2.750e+01 3.002e+01 3.673e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 18:00:22,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=161400.0, ans=0.125 2023-12-21 18:00:33,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=12.0 2023-12-21 18:00:40,948 INFO [train.py:886] (3/4) Epoch 6, batch 400, loss[loss=0.01979, audio_tagging_loss=0.01979, over 24750.00 frames. ], tot_loss[loss=0.01728, audio_tagging_loss=0.01728, over 4285366.69 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:00:55,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=161600.0, ans=0.125 2023-12-21 18:00:57,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=161600.0, ans=0.125 2023-12-21 18:00:59,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-12-21 18:01:30,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=161800.0, ans=0.0 2023-12-21 18:01:32,979 INFO [train.py:886] (3/4) Epoch 6, batch 450, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.01691, audio_tagging_loss=0.01691, over 4439467.08 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:02:04,638 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.486e+01 2.731e+01 2.952e+01 3.646e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:02:04,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=162066.66666666666, ans=0.125 2023-12-21 18:02:05,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=162066.66666666666, ans=0.125 2023-12-21 18:02:21,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=162133.33333333334, ans=0.0 2023-12-21 18:02:21,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-12-21 18:02:25,549 INFO [train.py:886] (3/4) Epoch 6, batch 500, loss[loss=0.01652, audio_tagging_loss=0.01652, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 4552720.46 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:02:31,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162200.0, ans=0.1 2023-12-21 18:02:43,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-12-21 18:02:47,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=162333.33333333334, ans=0.0 2023-12-21 18:03:17,199 INFO [train.py:886] (3/4) Epoch 6, batch 550, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4645101.19 frames. ], batch size: 100, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:03:38,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=162666.66666666666, ans=0.125 2023-12-21 18:03:49,378 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.522e+01 2.669e+01 2.950e+01 3.849e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 18:04:00,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-21 18:04:01,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=162800.0, ans=0.125 2023-12-21 18:04:06,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=162800.0, ans=0.125 2023-12-21 18:04:08,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2023-12-21 18:04:08,693 INFO [train.py:886] (3/4) Epoch 6, batch 600, loss[loss=0.01792, audio_tagging_loss=0.01792, over 24750.00 frames. ], tot_loss[loss=0.01657, audio_tagging_loss=0.01657, over 4707884.62 frames. ], batch size: 99, lr: 1.85e-02, grad_scale: 64.0 2023-12-21 18:04:20,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=162933.33333333334, ans=0.2 2023-12-21 18:04:23,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162933.33333333334, ans=0.1 2023-12-21 18:04:23,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=162933.33333333334, ans=0.125 2023-12-21 18:04:27,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-21 18:04:36,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163000.0, ans=0.1 2023-12-21 18:04:45,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.87 vs. limit=15.0 2023-12-21 18:04:48,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=163133.33333333334, ans=0.125 2023-12-21 18:04:50,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=163133.33333333334, ans=0.125 2023-12-21 18:04:50,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=163133.33333333334, ans=0.2 2023-12-21 18:04:51,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=163133.33333333334, ans=0.0 2023-12-21 18:04:55,848 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:04:56,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-21 18:05:01,098 INFO [train.py:886] (3/4) Epoch 6, batch 650, loss[loss=0.01961, audio_tagging_loss=0.01961, over 24750.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 4757126.72 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:05:33,146 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.590e+01 2.810e+01 2.971e+01 4.076e+01, threshold=5.620e+01, percent-clipped=0.0 2023-12-21 18:05:36,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=163400.0, ans=0.125 2023-12-21 18:05:48,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=163466.66666666666, ans=0.0 2023-12-21 18:05:52,692 INFO [train.py:886] (3/4) Epoch 6, batch 700, loss[loss=0.01665, audio_tagging_loss=0.01665, over 24750.00 frames. ], tot_loss[loss=0.01653, audio_tagging_loss=0.01653, over 4802518.60 frames. ], batch size: 99, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:05:55,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=163533.33333333334, ans=0.04949747468305833 2023-12-21 18:06:09,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-12-21 18:06:18,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163666.66666666666, ans=0.125 2023-12-21 18:06:33,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=163800.0, ans=0.0 2023-12-21 18:06:44,022 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.765e-01 2023-12-21 18:06:44,770 INFO [train.py:886] (3/4) Epoch 6, batch 750, loss[loss=0.01631, audio_tagging_loss=0.01631, over 25000.00 frames. ], tot_loss[loss=0.01646, audio_tagging_loss=0.01646, over 4839730.98 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:06:54,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.72 vs. limit=22.5 2023-12-21 18:07:12,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-21 18:07:17,165 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.591e+01 2.731e+01 2.911e+01 3.574e+01, threshold=5.461e+01, percent-clipped=0.0 2023-12-21 18:07:17,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164066.66666666666, ans=0.1 2023-12-21 18:07:24,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-12-21 18:07:35,940 INFO [train.py:886] (3/4) Epoch 6, batch 800, loss[loss=0.01537, audio_tagging_loss=0.01537, over 25000.00 frames. ], tot_loss[loss=0.01634, audio_tagging_loss=0.01634, over 4872764.29 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:07:47,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=164266.66666666666, ans=0.0 2023-12-21 18:07:47,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=164266.66666666666, ans=0.0 2023-12-21 18:08:10,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164400.0, ans=0.125 2023-12-21 18:08:26,509 INFO [train.py:886] (3/4) Epoch 6, batch 850, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4890911.20 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:08:57,859 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.614e+01 2.752e+01 3.030e+01 4.008e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-21 18:09:12,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164800.0, ans=0.1 2023-12-21 18:09:14,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=164800.0, ans=0.05 2023-12-21 18:09:17,281 INFO [train.py:886] (3/4) Epoch 6, batch 900, loss[loss=0.01862, audio_tagging_loss=0.01862, over 25000.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4910282.09 frames. ], batch size: 100, lr: 1.84e-02, grad_scale: 64.0 2023-12-21 18:09:17,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.73 vs. limit=22.5 2023-12-21 18:09:42,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165000.0, ans=0.1 2023-12-21 18:09:47,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=165066.66666666666, ans=0.0 2023-12-21 18:09:51,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=165066.66666666666, ans=0.125 2023-12-21 18:10:08,987 INFO [train.py:886] (3/4) Epoch 6, batch 950, loss[loss=0.02095, audio_tagging_loss=0.02095, over 24750.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4917181.17 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:10:09,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=165200.0, ans=0.0 2023-12-21 18:10:10,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-21 18:10:12,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.13 vs. limit=5.0 2023-12-21 18:10:41,003 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.630e+01 2.802e+01 3.017e+01 4.236e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 18:10:41,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=165400.0, ans=0.125 2023-12-21 18:10:42,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=165400.0, ans=0.2 2023-12-21 18:10:44,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-21 18:10:52,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-12-21 18:10:53,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=165466.66666666666, ans=0.05 2023-12-21 18:10:55,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=165466.66666666666, ans=0.0 2023-12-21 18:11:01,347 INFO [train.py:886] (3/4) Epoch 6, batch 1000, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01659, audio_tagging_loss=0.01659, over 4920118.70 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:03,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=165533.33333333334, ans=10.0 2023-12-21 18:11:04,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=165533.33333333334, ans=0.125 2023-12-21 18:11:14,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=165600.0, ans=0.125 2023-12-21 18:11:22,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-12-21 18:11:27,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=165666.66666666666, ans=0.05 2023-12-21 18:11:38,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165733.33333333334, ans=0.125 2023-12-21 18:11:39,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=165733.33333333334, ans=0.2 2023-12-21 18:11:46,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165800.0, ans=0.1 2023-12-21 18:11:52,569 INFO [train.py:886] (3/4) Epoch 6, batch 1050, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4927374.33 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:11:55,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=165866.66666666666, ans=0.2 2023-12-21 18:12:04,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165933.33333333334, ans=0.1 2023-12-21 18:12:06,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=165933.33333333334, ans=0.0 2023-12-21 18:12:25,142 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.084e+01 2.514e+01 2.651e+01 2.892e+01 3.448e+01, threshold=5.301e+01, percent-clipped=0.0 2023-12-21 18:12:30,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=166066.66666666666, ans=0.0 2023-12-21 18:12:40,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=166133.33333333334, ans=0.0 2023-12-21 18:12:45,367 INFO [train.py:886] (3/4) Epoch 6, batch 1100, loss[loss=0.01927, audio_tagging_loss=0.01927, over 25000.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 4921973.63 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:12:48,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=166200.0, ans=0.1 2023-12-21 18:13:00,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=166266.66666666666, ans=0.0 2023-12-21 18:13:07,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=166333.33333333334, ans=15.0 2023-12-21 18:13:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=166400.0, ans=0.125 2023-12-21 18:13:27,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166466.66666666666, ans=0.1 2023-12-21 18:13:32,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=166466.66666666666, ans=0.125 2023-12-21 18:13:32,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=166466.66666666666, ans=0.125 2023-12-21 18:13:35,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=166466.66666666666, ans=0.0 2023-12-21 18:13:37,814 INFO [train.py:886] (3/4) Epoch 6, batch 1150, loss[loss=0.01828, audio_tagging_loss=0.01828, over 24040.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4927271.28 frames. ], batch size: 100, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:14:10,170 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.615e+01 2.716e+01 2.897e+01 3.579e+01, threshold=5.433e+01, percent-clipped=0.0 2023-12-21 18:14:25,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.68 vs. limit=22.5 2023-12-21 18:14:29,605 INFO [train.py:886] (3/4) Epoch 6, batch 1200, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4937941.21 frames. ], batch size: 99, lr: 1.83e-02, grad_scale: 64.0 2023-12-21 18:14:36,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=166866.66666666666, ans=0.125 2023-12-21 18:14:46,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.37 vs. limit=22.5 2023-12-21 18:14:49,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2023-12-21 18:15:03,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=167066.66666666666, ans=0.125 2023-12-21 18:15:11,200 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.262e-02 2023-12-21 18:15:12,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=167133.33333333334, ans=0.125 2023-12-21 18:15:21,719 INFO [train.py:886] (3/4) Epoch 6, batch 1250, loss[loss=0.01793, audio_tagging_loss=0.01793, over 24750.00 frames. ], tot_loss[loss=0.01654, audio_tagging_loss=0.01654, over 4935437.24 frames. ], batch size: 99, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:15:50,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167333.33333333334, ans=0.1 2023-12-21 18:15:53,903 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.561e+01 2.733e+01 2.928e+01 3.774e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 18:15:59,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=167400.0, ans=0.0 2023-12-21 18:16:12,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=167533.33333333334, ans=22.5 2023-12-21 18:16:13,500 INFO [train.py:886] (3/4) Epoch 6, batch 1300, loss[loss=0.01729, audio_tagging_loss=0.01729, over 25000.00 frames. ], tot_loss[loss=0.0166, audio_tagging_loss=0.0166, over 4935433.48 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:16:41,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167666.66666666666, ans=0.1 2023-12-21 18:16:46,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167733.33333333334, ans=0.125 2023-12-21 18:16:56,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=167800.0, ans=0.125 2023-12-21 18:16:58,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=167800.0, ans=0.125 2023-12-21 18:17:05,873 INFO [train.py:886] (3/4) Epoch 6, batch 1350, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 4937778.84 frames. ], batch size: 99, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:17:18,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=167933.33333333334, ans=0.125 2023-12-21 18:17:25,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=168000.0, ans=0.0 2023-12-21 18:17:30,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168000.0, ans=0.125 2023-12-21 18:17:37,725 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.576e+01 2.766e+01 2.896e+01 3.623e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-21 18:17:41,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=168066.66666666666, ans=0.0 2023-12-21 18:17:57,274 INFO [train.py:886] (3/4) Epoch 6, batch 1400, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4941524.40 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:13,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168266.66666666666, ans=0.125 2023-12-21 18:18:45,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=168466.66666666666, ans=0.0 2023-12-21 18:18:48,997 INFO [train.py:886] (3/4) Epoch 6, batch 1450, loss[loss=0.02088, audio_tagging_loss=0.02088, over 21716.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4943052.82 frames. ], batch size: 107, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:18:58,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=168533.33333333334, ans=0.125 2023-12-21 18:19:00,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=168600.0, ans=0.125 2023-12-21 18:19:10,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-12-21 18:19:18,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=168666.66666666666, ans=0.1 2023-12-21 18:19:21,220 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.551e+01 2.724e+01 2.909e+01 3.642e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 18:19:40,608 INFO [train.py:886] (3/4) Epoch 6, batch 1500, loss[loss=0.01574, audio_tagging_loss=0.01574, over 25000.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4947408.77 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 64.0 2023-12-21 18:19:53,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=168933.33333333334, ans=0.125 2023-12-21 18:20:06,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-21 18:20:23,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=169133.33333333334, ans=0.125 2023-12-21 18:20:24,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169133.33333333334, ans=0.1 2023-12-21 18:20:33,294 INFO [train.py:886] (3/4) Epoch 6, batch 1550, loss[loss=0.01827, audio_tagging_loss=0.01827, over 25000.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4949216.68 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:20:55,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-12-21 18:20:57,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=169333.33333333334, ans=0.125 2023-12-21 18:21:04,693 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.623e+01 2.761e+01 2.988e+01 3.455e+01, threshold=5.522e+01, percent-clipped=0.0 2023-12-21 18:21:09,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=169400.0, ans=0.125 2023-12-21 18:21:23,966 INFO [train.py:886] (3/4) Epoch 6, batch 1600, loss[loss=0.01854, audio_tagging_loss=0.01854, over 24750.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4947364.47 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:21:25,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-12-21 18:21:28,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=12.0 2023-12-21 18:21:30,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-12-21 18:21:30,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=169533.33333333334, ans=0.0 2023-12-21 18:21:35,050 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.608e-02 2023-12-21 18:21:39,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=169600.0, ans=0.05 2023-12-21 18:21:44,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=169666.66666666666, ans=0.0 2023-12-21 18:21:53,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-12-21 18:22:13,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=169866.66666666666, ans=0.125 2023-12-21 18:22:14,569 INFO [train.py:886] (3/4) Epoch 6, batch 1650, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24750.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 4946308.59 frames. ], batch size: 99, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:22:19,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2023-12-21 18:22:38,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=170000.0, ans=0.125 2023-12-21 18:22:46,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=170066.66666666666, ans=0.1 2023-12-21 18:22:47,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.597e+01 2.773e+01 2.993e+01 4.191e+01, threshold=5.546e+01, percent-clipped=0.0 2023-12-21 18:22:56,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=170133.33333333334, ans=0.2 2023-12-21 18:23:06,296 INFO [train.py:886] (3/4) Epoch 6, batch 1700, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 4947410.34 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:23:06,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=170200.0, ans=0.125 2023-12-21 18:23:20,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-12-21 18:23:26,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=170333.33333333334, ans=0.125 2023-12-21 18:23:48,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=170466.66666666666, ans=0.2 2023-12-21 18:23:48,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=15.0 2023-12-21 18:23:51,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-21 18:23:54,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=170466.66666666666, ans=0.125 2023-12-21 18:23:58,269 INFO [train.py:886] (3/4) Epoch 6, batch 1750, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4950168.34 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:24:01,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170533.33333333334, ans=0.125 2023-12-21 18:24:14,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=170600.0, ans=0.125 2023-12-21 18:24:22,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170666.66666666666, ans=0.1 2023-12-21 18:24:31,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.494e+01 2.677e+01 2.882e+01 3.741e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 18:24:41,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=170800.0, ans=0.5 2023-12-21 18:24:51,583 INFO [train.py:886] (3/4) Epoch 6, batch 1800, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4955602.81 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 64.0 2023-12-21 18:25:02,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-12-21 18:25:09,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-21 18:25:09,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=170933.33333333334, ans=0.0 2023-12-21 18:25:28,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2023-12-21 18:25:36,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=171133.33333333334, ans=10.0 2023-12-21 18:25:42,683 INFO [train.py:886] (3/4) Epoch 6, batch 1850, loss[loss=0.01768, audio_tagging_loss=0.01768, over 25000.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4954286.92 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:25:45,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-21 18:25:49,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=171200.0, ans=0.0 2023-12-21 18:25:59,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=171266.66666666666, ans=0.2 2023-12-21 18:26:15,598 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.610e+01 2.790e+01 3.016e+01 3.716e+01, threshold=5.580e+01, percent-clipped=0.0 2023-12-21 18:26:24,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=171466.66666666666, ans=0.0 2023-12-21 18:26:24,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2023-12-21 18:26:34,783 INFO [train.py:886] (3/4) Epoch 6, batch 1900, loss[loss=0.01641, audio_tagging_loss=0.01641, over 24750.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4950751.60 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:26:43,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=171533.33333333334, ans=0.0 2023-12-21 18:26:44,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=171600.0, ans=0.2 2023-12-21 18:26:45,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171600.0, ans=0.125 2023-12-21 18:26:59,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=171666.66666666666, ans=0.2 2023-12-21 18:27:11,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=171733.33333333334, ans=0.0 2023-12-21 18:27:27,090 INFO [train.py:886] (3/4) Epoch 6, batch 1950, loss[loss=0.01875, audio_tagging_loss=0.01875, over 23973.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4948614.66 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:27:34,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2023-12-21 18:27:37,491 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.059e+01 2023-12-21 18:27:51,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=172000.0, ans=0.2 2023-12-21 18:27:52,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=172000.0, ans=0.0 2023-12-21 18:27:54,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=172000.0, ans=0.2 2023-12-21 18:27:54,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=172000.0, ans=0.2 2023-12-21 18:27:59,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=172066.66666666666, ans=0.125 2023-12-21 18:28:00,979 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.565e+01 2.716e+01 2.900e+01 3.603e+01, threshold=5.432e+01, percent-clipped=0.0 2023-12-21 18:28:18,824 INFO [train.py:886] (3/4) Epoch 6, batch 2000, loss[loss=0.01703, audio_tagging_loss=0.01703, over 22328.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4949448.63 frames. ], batch size: 107, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:28:20,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172200.0, ans=0.1 2023-12-21 18:28:26,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=172200.0, ans=0.2 2023-12-21 18:28:38,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172266.66666666666, ans=0.1 2023-12-21 18:28:47,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=172333.33333333334, ans=0.125 2023-12-21 18:28:48,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=172400.0, ans=0.0 2023-12-21 18:29:10,790 INFO [train.py:886] (3/4) Epoch 6, batch 2050, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4951942.44 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:29:15,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.50 vs. limit=22.5 2023-12-21 18:29:16,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=172533.33333333334, ans=0.125 2023-12-21 18:29:30,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-12-21 18:29:31,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.40 vs. limit=10.0 2023-12-21 18:29:43,367 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.562e+01 2.748e+01 2.968e+01 3.569e+01, threshold=5.496e+01, percent-clipped=0.0 2023-12-21 18:30:01,226 INFO [train.py:886] (3/4) Epoch 6, batch 2100, loss[loss=0.01827, audio_tagging_loss=0.01827, over 24750.00 frames. ], tot_loss[loss=0.01615, audio_tagging_loss=0.01615, over 4957638.17 frames. ], batch size: 99, lr: 1.80e-02, grad_scale: 64.0 2023-12-21 18:30:29,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=173000.0, ans=0.0 2023-12-21 18:30:43,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=173133.33333333334, ans=0.0 2023-12-21 18:30:49,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=173133.33333333334, ans=0.1 2023-12-21 18:30:49,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.91 vs. limit=15.0 2023-12-21 18:30:53,358 INFO [train.py:886] (3/4) Epoch 6, batch 2150, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 4962688.22 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:30:54,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=173200.0, ans=0.0 2023-12-21 18:31:07,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=173266.66666666666, ans=0.0 2023-12-21 18:31:08,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=173266.66666666666, ans=0.0 2023-12-21 18:31:08,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.13 vs. limit=22.5 2023-12-21 18:31:14,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=12.0 2023-12-21 18:31:26,792 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.590e+01 2.794e+01 3.040e+01 3.581e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 18:31:28,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=173400.0, ans=0.2 2023-12-21 18:31:30,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=173400.0, ans=0.07 2023-12-21 18:31:31,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-21 18:31:37,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=173466.66666666666, ans=0.05 2023-12-21 18:31:46,050 INFO [train.py:886] (3/4) Epoch 6, batch 2200, loss[loss=0.02013, audio_tagging_loss=0.02013, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4960067.16 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:31:46,313 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.547e-02 2023-12-21 18:31:48,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=173533.33333333334, ans=0.125 2023-12-21 18:31:49,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=173533.33333333334, ans=0.125 2023-12-21 18:31:57,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=173600.0, ans=0.125 2023-12-21 18:32:06,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=173666.66666666666, ans=0.125 2023-12-21 18:32:08,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173666.66666666666, ans=0.125 2023-12-21 18:32:13,121 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.024e-01 2023-12-21 18:32:13,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=173666.66666666666, ans=0.2 2023-12-21 18:32:13,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=173666.66666666666, ans=0.05 2023-12-21 18:32:37,601 INFO [train.py:886] (3/4) Epoch 6, batch 2250, loss[loss=0.0166, audio_tagging_loss=0.0166, over 24750.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4950610.07 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:32:43,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173866.66666666666, ans=0.1 2023-12-21 18:32:50,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=173933.33333333334, ans=0.05 2023-12-21 18:33:02,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-21 18:33:08,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=174066.66666666666, ans=0.2 2023-12-21 18:33:10,620 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.579e+01 2.731e+01 2.928e+01 3.593e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 18:33:17,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.47 vs. limit=22.5 2023-12-21 18:33:30,100 INFO [train.py:886] (3/4) Epoch 6, batch 2300, loss[loss=0.01397, audio_tagging_loss=0.01397, over 22231.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4950287.03 frames. ], batch size: 107, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:33:51,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=174333.33333333334, ans=0.125 2023-12-21 18:34:13,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174466.66666666666, ans=0.125 2023-12-21 18:34:21,986 INFO [train.py:886] (3/4) Epoch 6, batch 2350, loss[loss=0.01592, audio_tagging_loss=0.01592, over 24750.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4945782.02 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:34:35,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=174600.0, ans=0.0 2023-12-21 18:34:37,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.60 vs. limit=22.5 2023-12-21 18:34:37,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=174600.0, ans=0.0 2023-12-21 18:34:38,776 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.507e+00 2023-12-21 18:34:45,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=174666.66666666666, ans=0.1 2023-12-21 18:34:45,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.11 vs. limit=15.0 2023-12-21 18:34:46,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.11 vs. limit=22.5 2023-12-21 18:34:48,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=174666.66666666666, ans=0.0 2023-12-21 18:34:55,210 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.528e+01 2.689e+01 2.848e+01 3.552e+01, threshold=5.378e+01, percent-clipped=0.0 2023-12-21 18:34:56,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=174733.33333333334, ans=0.125 2023-12-21 18:35:13,768 INFO [train.py:886] (3/4) Epoch 6, batch 2400, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4953014.98 frames. ], batch size: 99, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:35:20,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=15.0 2023-12-21 18:35:23,094 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.447e-01 2023-12-21 18:35:44,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=175066.66666666666, ans=0.125 2023-12-21 18:35:51,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175066.66666666666, ans=0.125 2023-12-21 18:35:53,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=175066.66666666666, ans=15.0 2023-12-21 18:36:05,897 INFO [train.py:886] (3/4) Epoch 6, batch 2450, loss[loss=0.01776, audio_tagging_loss=0.01776, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4958269.57 frames. ], batch size: 100, lr: 1.79e-02, grad_scale: 64.0 2023-12-21 18:36:13,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=175200.0, ans=0.0 2023-12-21 18:36:18,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.06 vs. limit=22.5 2023-12-21 18:36:24,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=175266.66666666666, ans=0.0 2023-12-21 18:36:27,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=175333.33333333334, ans=0.125 2023-12-21 18:36:36,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-21 18:36:38,853 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.641e+01 2.797e+01 2.976e+01 3.945e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 18:36:41,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=175400.0, ans=0.2 2023-12-21 18:36:57,342 INFO [train.py:886] (3/4) Epoch 6, batch 2500, loss[loss=0.01848, audio_tagging_loss=0.01848, over 24750.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4950534.98 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:37:00,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=175533.33333333334, ans=0.2 2023-12-21 18:37:05,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=175533.33333333334, ans=0.125 2023-12-21 18:37:05,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=175533.33333333334, ans=0.125 2023-12-21 18:37:22,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=175666.66666666666, ans=0.125 2023-12-21 18:37:22,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=175666.66666666666, ans=0.0 2023-12-21 18:37:27,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=175666.66666666666, ans=0.0 2023-12-21 18:37:29,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-12-21 18:37:49,686 INFO [train.py:886] (3/4) Epoch 6, batch 2550, loss[loss=0.01608, audio_tagging_loss=0.01608, over 24750.00 frames. ], tot_loss[loss=0.01635, audio_tagging_loss=0.01635, over 4946604.20 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:37:57,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=175866.66666666666, ans=0.0 2023-12-21 18:38:00,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=12.0 2023-12-21 18:38:02,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=175933.33333333334, ans=0.0 2023-12-21 18:38:13,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=176000.0, ans=0.125 2023-12-21 18:38:20,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176066.66666666666, ans=0.1 2023-12-21 18:38:22,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.592e+01 2.752e+01 3.040e+01 4.422e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 18:38:34,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=176133.33333333334, ans=0.125 2023-12-21 18:38:42,287 INFO [train.py:886] (3/4) Epoch 6, batch 2600, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 4947484.92 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:38:50,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2023-12-21 18:38:55,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0 2023-12-21 18:39:07,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=176333.33333333334, ans=0.125 2023-12-21 18:39:09,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176333.33333333334, ans=0.1 2023-12-21 18:39:30,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=176466.66666666666, ans=0.0 2023-12-21 18:39:33,956 INFO [train.py:886] (3/4) Epoch 6, batch 2650, loss[loss=0.0151, audio_tagging_loss=0.0151, over 23990.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4952736.32 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:39:35,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=176533.33333333334, ans=0.125 2023-12-21 18:39:44,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.07 vs. limit=12.0 2023-12-21 18:39:57,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=176666.66666666666, ans=0.2 2023-12-21 18:39:59,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=176666.66666666666, ans=0.125 2023-12-21 18:40:03,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=176733.33333333334, ans=0.2 2023-12-21 18:40:07,084 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.558e+01 2.691e+01 2.831e+01 3.904e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 18:40:08,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2023-12-21 18:40:13,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-12-21 18:40:23,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176800.0, ans=0.1 2023-12-21 18:40:26,264 INFO [train.py:886] (3/4) Epoch 6, batch 2700, loss[loss=0.01705, audio_tagging_loss=0.01705, over 24750.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4956994.63 frames. ], batch size: 99, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:40:36,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2023-12-21 18:40:44,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=176933.33333333334, ans=0.05 2023-12-21 18:40:58,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=177066.66666666666, ans=0.025 2023-12-21 18:41:01,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.36 vs. limit=22.5 2023-12-21 18:41:14,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=177133.33333333334, ans=0.5 2023-12-21 18:41:16,673 INFO [train.py:886] (3/4) Epoch 6, batch 2750, loss[loss=0.0171, audio_tagging_loss=0.0171, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4957839.23 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 64.0 2023-12-21 18:41:18,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-21 18:41:43,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=177333.33333333334, ans=0.125 2023-12-21 18:41:48,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.57 vs. limit=22.5 2023-12-21 18:41:49,371 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.557e+01 2.736e+01 2.928e+01 3.710e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-21 18:42:04,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-12-21 18:42:05,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2023-12-21 18:42:07,723 INFO [train.py:886] (3/4) Epoch 6, batch 2800, loss[loss=0.01673, audio_tagging_loss=0.01673, over 25000.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4953331.78 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:42:08,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177533.33333333334, ans=0.1 2023-12-21 18:42:18,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=177600.0, ans=0.125 2023-12-21 18:42:42,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-12-21 18:42:52,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=177800.0, ans=0.0 2023-12-21 18:42:56,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=177800.0, ans=0.2 2023-12-21 18:42:59,859 INFO [train.py:886] (3/4) Epoch 6, batch 2850, loss[loss=0.01773, audio_tagging_loss=0.01773, over 24750.00 frames. ], tot_loss[loss=0.01633, audio_tagging_loss=0.01633, over 4945637.25 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:43:01,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=177866.66666666666, ans=0.125 2023-12-21 18:43:06,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-21 18:43:11,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-12-21 18:43:33,499 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.570e+01 2.729e+01 2.961e+01 3.657e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 18:43:40,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=178133.33333333334, ans=0.5 2023-12-21 18:43:41,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-12-21 18:43:51,165 INFO [train.py:886] (3/4) Epoch 6, batch 2900, loss[loss=0.01938, audio_tagging_loss=0.01938, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4937207.64 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:43:52,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=178200.0, ans=0.0 2023-12-21 18:44:04,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178266.66666666666, ans=0.1 2023-12-21 18:44:10,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=178266.66666666666, ans=0.0 2023-12-21 18:44:21,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2023-12-21 18:44:24,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=8.0 2023-12-21 18:44:24,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178400.0, ans=0.0 2023-12-21 18:44:29,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2023-12-21 18:44:34,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=178466.66666666666, ans=0.2 2023-12-21 18:44:41,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-21 18:44:41,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=178466.66666666666, ans=0.0 2023-12-21 18:44:41,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-12-21 18:44:43,479 INFO [train.py:886] (3/4) Epoch 6, batch 2950, loss[loss=0.01755, audio_tagging_loss=0.01755, over 25000.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 4943135.04 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:44:45,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=178533.33333333334, ans=0.2 2023-12-21 18:44:56,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=178600.0, ans=0.2 2023-12-21 18:45:05,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2023-12-21 18:45:05,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=178666.66666666666, ans=0.0 2023-12-21 18:45:16,980 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.523e+01 2.674e+01 2.981e+01 3.708e+01, threshold=5.347e+01, percent-clipped=0.0 2023-12-21 18:45:20,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=178733.33333333334, ans=0.125 2023-12-21 18:45:29,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2023-12-21 18:45:33,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=178800.0, ans=0.0 2023-12-21 18:45:34,823 INFO [train.py:886] (3/4) Epoch 6, batch 3000, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01619, audio_tagging_loss=0.01619, over 4951430.78 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:45:34,823 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 18:45:47,468 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8351, 2.5301, 2.6487, 2.2360, 2.2665, 1.6761, 1.2980, 2.4168], device='cuda:3') 2023-12-21 18:45:56,018 INFO [train.py:917] (3/4) Epoch 6, validation: loss=0.03776, audio_tagging_loss=0.03776, over 3737520.00 frames. 2023-12-21 18:45:56,019 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 18:46:01,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=178866.66666666666, ans=0.125 2023-12-21 18:46:05,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=178866.66666666666, ans=0.07 2023-12-21 18:46:09,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=178933.33333333334, ans=0.125 2023-12-21 18:46:12,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.61 vs. limit=15.0 2023-12-21 18:46:14,633 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.166e+00 2023-12-21 18:46:25,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179066.66666666666, ans=0.1 2023-12-21 18:46:26,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=179066.66666666666, ans=0.125 2023-12-21 18:46:43,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=179133.33333333334, ans=0.2 2023-12-21 18:46:45,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=179133.33333333334, ans=0.125 2023-12-21 18:46:48,373 INFO [train.py:886] (3/4) Epoch 6, batch 3050, loss[loss=0.01811, audio_tagging_loss=0.01811, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4959228.77 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:47:02,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=179266.66666666666, ans=22.5 2023-12-21 18:47:20,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179400.0, ans=0.0 2023-12-21 18:47:20,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.41 vs. limit=10.0 2023-12-21 18:47:21,408 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.570e+01 2.697e+01 2.943e+01 3.684e+01, threshold=5.394e+01, percent-clipped=0.0 2023-12-21 18:47:31,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=179466.66666666666, ans=0.0 2023-12-21 18:47:32,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=179466.66666666666, ans=0.125 2023-12-21 18:47:34,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-12-21 18:47:38,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=179466.66666666666, ans=0.125 2023-12-21 18:47:40,101 INFO [train.py:886] (3/4) Epoch 6, batch 3100, loss[loss=0.01691, audio_tagging_loss=0.01691, over 24750.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4956171.15 frames. ], batch size: 99, lr: 1.77e-02, grad_scale: 64.0 2023-12-21 18:47:47,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=15.0 2023-12-21 18:47:54,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=179600.0, ans=0.125 2023-12-21 18:48:11,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=179733.33333333334, ans=0.0 2023-12-21 18:48:22,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=179800.0, ans=0.025 2023-12-21 18:48:31,645 INFO [train.py:886] (3/4) Epoch 6, batch 3150, loss[loss=0.01641, audio_tagging_loss=0.01641, over 24750.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 4949328.23 frames. ], batch size: 99, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:48:47,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=179933.33333333334, ans=0.2 2023-12-21 18:48:51,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180000.0, ans=0.1 2023-12-21 18:48:56,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180000.0, ans=0.125 2023-12-21 18:49:02,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-21 18:49:04,095 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.612e+01 2.785e+01 2.963e+01 3.956e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-21 18:49:20,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=180133.33333333334, ans=0.2 2023-12-21 18:49:23,227 INFO [train.py:886] (3/4) Epoch 6, batch 3200, loss[loss=0.01528, audio_tagging_loss=0.01528, over 25000.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 4945990.91 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:49:29,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=180200.0, ans=0.2 2023-12-21 18:49:34,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=180266.66666666666, ans=0.125 2023-12-21 18:49:52,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=15.0 2023-12-21 18:49:55,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180400.0, ans=0.125 2023-12-21 18:49:56,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=15.0 2023-12-21 18:49:56,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=180400.0, ans=0.125 2023-12-21 18:50:02,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=180400.0, ans=15.0 2023-12-21 18:50:12,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2023-12-21 18:50:14,309 INFO [train.py:886] (3/4) Epoch 6, batch 3250, loss[loss=0.01464, audio_tagging_loss=0.01464, over 25000.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 4947826.59 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:50:21,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=180533.33333333334, ans=0.07 2023-12-21 18:50:47,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.514e+01 2.742e+01 2.966e+01 4.089e+01, threshold=5.485e+01, percent-clipped=0.0 2023-12-21 18:51:03,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=180800.0, ans=0.125 2023-12-21 18:51:06,739 INFO [train.py:886] (3/4) Epoch 6, batch 3300, loss[loss=0.01665, audio_tagging_loss=0.01665, over 24750.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4951306.77 frames. ], batch size: 99, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:51:06,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=180866.66666666666, ans=0.125 2023-12-21 18:51:13,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=180866.66666666666, ans=0.0 2023-12-21 18:51:16,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2023-12-21 18:51:38,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.84 vs. limit=22.5 2023-12-21 18:51:54,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=181133.33333333334, ans=0.125 2023-12-21 18:51:59,249 INFO [train.py:886] (3/4) Epoch 6, batch 3350, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 4947303.87 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:52:08,742 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 18:52:09,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181266.66666666666, ans=0.1 2023-12-21 18:52:14,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=181266.66666666666, ans=0.125 2023-12-21 18:52:22,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=181333.33333333334, ans=0.2 2023-12-21 18:52:31,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=181400.0, ans=0.1 2023-12-21 18:52:32,377 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.583e+01 2.776e+01 2.913e+01 4.067e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:52:49,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=181533.33333333334, ans=0.125 2023-12-21 18:52:50,297 INFO [train.py:886] (3/4) Epoch 6, batch 3400, loss[loss=0.01604, audio_tagging_loss=0.01604, over 25000.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 4955647.59 frames. ], batch size: 100, lr: 1.76e-02, grad_scale: 64.0 2023-12-21 18:53:08,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2023-12-21 18:53:09,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-21 18:53:21,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.27 vs. limit=22.5 2023-12-21 18:53:25,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181733.33333333334, ans=0.125 2023-12-21 18:53:42,552 INFO [train.py:886] (3/4) Epoch 6, batch 3450, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4953379.61 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:53:53,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181933.33333333334, ans=0.1 2023-12-21 18:54:06,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=182000.0, ans=0.1 2023-12-21 18:54:13,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=182066.66666666666, ans=0.0 2023-12-21 18:54:15,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.567e+01 2.759e+01 2.912e+01 3.537e+01, threshold=5.518e+01, percent-clipped=0.0 2023-12-21 18:54:28,546 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.575e-03 2023-12-21 18:54:28,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=182133.33333333334, ans=0.125 2023-12-21 18:54:30,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=182133.33333333334, ans=0.125 2023-12-21 18:54:30,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=15.0 2023-12-21 18:54:33,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=182133.33333333334, ans=0.125 2023-12-21 18:54:34,844 INFO [train.py:886] (3/4) Epoch 6, batch 3500, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4947866.66 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:54:36,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-21 18:54:42,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=182200.0, ans=0.0 2023-12-21 18:54:44,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-12-21 18:54:55,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=182333.33333333334, ans=0.05 2023-12-21 18:55:02,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=182333.33333333334, ans=0.125 2023-12-21 18:55:02,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-21 18:55:03,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-12-21 18:55:04,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-12-21 18:55:05,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.40 vs. limit=22.5 2023-12-21 18:55:08,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=182400.0, ans=0.0 2023-12-21 18:55:18,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-12-21 18:55:26,233 INFO [train.py:886] (3/4) Epoch 6, batch 3550, loss[loss=0.01295, audio_tagging_loss=0.01295, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4952434.27 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 64.0 2023-12-21 18:55:28,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182533.33333333334, ans=0.125 2023-12-21 18:55:29,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=182533.33333333334, ans=0.125 2023-12-21 18:55:39,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=182600.0, ans=0.015 2023-12-21 18:55:59,188 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.568e+01 2.734e+01 3.047e+01 3.818e+01, threshold=5.468e+01, percent-clipped=0.0 2023-12-21 18:55:59,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=182733.33333333334, ans=0.0 2023-12-21 18:56:13,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=182800.0, ans=0.0 2023-12-21 18:56:18,364 INFO [train.py:886] (3/4) Epoch 6, batch 3600, loss[loss=0.01489, audio_tagging_loss=0.01489, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4956085.81 frames. ], batch size: 99, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:56:18,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-12-21 18:56:20,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=182866.66666666666, ans=0.0 2023-12-21 18:56:26,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=182866.66666666666, ans=0.0 2023-12-21 18:56:28,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=12.0 2023-12-21 18:56:29,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=182933.33333333334, ans=0.125 2023-12-21 18:56:45,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=183000.0, ans=0.05 2023-12-21 18:56:50,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-21 18:56:55,491 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.141e-03 2023-12-21 18:56:59,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183133.33333333334, ans=0.1 2023-12-21 18:57:01,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=183133.33333333334, ans=0.09899494936611666 2023-12-21 18:57:06,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=183133.33333333334, ans=0.2 2023-12-21 18:57:09,933 INFO [train.py:886] (3/4) Epoch 6, batch 3650, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4960192.91 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:57:13,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2023-12-21 18:57:30,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=183333.33333333334, ans=0.125 2023-12-21 18:57:40,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-12-21 18:57:43,150 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.536e+01 2.775e+01 2.969e+01 4.342e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 18:57:45,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2023-12-21 18:58:01,815 INFO [train.py:886] (3/4) Epoch 6, batch 3700, loss[loss=0.02029, audio_tagging_loss=0.02029, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4963383.82 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:58:02,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=12.0 2023-12-21 18:58:17,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-21 18:58:29,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.75 vs. limit=22.5 2023-12-21 18:58:30,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=183666.66666666666, ans=0.0 2023-12-21 18:58:37,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=183733.33333333334, ans=0.95 2023-12-21 18:58:48,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=183800.0, ans=0.125 2023-12-21 18:58:54,272 INFO [train.py:886] (3/4) Epoch 6, batch 3750, loss[loss=0.02212, audio_tagging_loss=0.02212, over 24946.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4965765.75 frames. ], batch size: 100, lr: 1.75e-02, grad_scale: 128.0 2023-12-21 18:58:58,895 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.787e-01 2023-12-21 18:59:12,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=183933.33333333334, ans=0.0 2023-12-21 18:59:28,151 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.576e+01 2.747e+01 2.976e+01 3.504e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 18:59:45,100 INFO [train.py:886] (3/4) Epoch 6, batch 3800, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01615, audio_tagging_loss=0.01615, over 4957409.59 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 18:59:47,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=184200.0, ans=0.125 2023-12-21 19:00:02,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=184266.66666666666, ans=0.0 2023-12-21 19:00:22,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=184400.0, ans=0.125 2023-12-21 19:00:36,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.35 vs. limit=10.0 2023-12-21 19:00:37,456 INFO [train.py:886] (3/4) Epoch 6, batch 3850, loss[loss=0.017, audio_tagging_loss=0.017, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4945941.38 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 128.0 2023-12-21 19:00:39,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=184533.33333333334, ans=0.0 2023-12-21 19:00:42,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184533.33333333334, ans=0.125 2023-12-21 19:00:46,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2023-12-21 19:00:47,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=184600.0, ans=0.2 2023-12-21 19:00:49,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2023-12-21 19:00:53,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=184600.0, ans=0.125 2023-12-21 19:00:56,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-21 19:01:11,866 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.661e+01 2.816e+01 3.118e+01 3.976e+01, threshold=5.631e+01, percent-clipped=0.0 2023-12-21 19:01:12,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=184733.33333333334, ans=0.125 2023-12-21 19:01:29,358 INFO [train.py:886] (3/4) Epoch 6, batch 3900, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4947873.90 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:01:40,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:41,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:44,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184933.33333333334, ans=0.125 2023-12-21 19:01:56,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=185000.0, ans=0.125 2023-12-21 19:01:59,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=185066.66666666666, ans=0.125 2023-12-21 19:02:17,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=185133.33333333334, ans=0.125 2023-12-21 19:02:20,938 INFO [train.py:886] (3/4) Epoch 6, batch 3950, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4952524.97 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:02:49,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-12-21 19:02:52,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185400.0, ans=0.1 2023-12-21 19:02:55,025 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.574e+01 2.731e+01 2.913e+01 3.749e+01, threshold=5.463e+01, percent-clipped=0.0 2023-12-21 19:03:01,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=185466.66666666666, ans=0.125 2023-12-21 19:03:01,812 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.012e+00 2023-12-21 19:03:13,948 INFO [train.py:886] (3/4) Epoch 6, batch 4000, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 4952241.90 frames. ], batch size: 100, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:03:16,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=185533.33333333334, ans=0.0 2023-12-21 19:03:20,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185533.33333333334, ans=0.1 2023-12-21 19:03:20,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185533.33333333334, ans=0.1 2023-12-21 19:04:00,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.16 vs. limit=22.5 2023-12-21 19:04:01,570 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.425e-01 2023-12-21 19:04:04,178 INFO [train.py:886] (3/4) Epoch 6, batch 4050, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4955833.45 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:04:10,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185866.66666666666, ans=0.1 2023-12-21 19:04:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=185933.33333333334, ans=0.125 2023-12-21 19:04:17,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=185933.33333333334, ans=0.0 2023-12-21 19:04:18,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-12-21 19:04:23,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=185933.33333333334, ans=0.2 2023-12-21 19:04:23,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.38 vs. limit=15.0 2023-12-21 19:04:32,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=186000.0, ans=0.0 2023-12-21 19:04:38,165 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.667e+01 2.852e+01 3.052e+01 4.692e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 19:04:49,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=186133.33333333334, ans=0.2 2023-12-21 19:04:56,391 INFO [train.py:886] (3/4) Epoch 6, batch 4100, loss[loss=0.01846, audio_tagging_loss=0.01846, over 24750.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4951366.32 frames. ], batch size: 99, lr: 1.74e-02, grad_scale: 64.0 2023-12-21 19:05:02,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=186200.0, ans=0.125 2023-12-21 19:05:11,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=186266.66666666666, ans=0.125 2023-12-21 19:05:21,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=186333.33333333334, ans=0.04949747468305833 2023-12-21 19:05:47,587 INFO [train.py:886] (3/4) Epoch 6, batch 4150, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4945319.80 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:05:49,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186533.33333333334, ans=0.1 2023-12-21 19:06:01,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=186600.0, ans=0.0 2023-12-21 19:06:01,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=186600.0, ans=0.125 2023-12-21 19:06:23,376 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.572e+01 2.768e+01 2.919e+01 3.427e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:06:23,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=186733.33333333334, ans=0.125 2023-12-21 19:06:32,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-12-21 19:06:41,001 INFO [train.py:886] (3/4) Epoch 6, batch 4200, loss[loss=0.01682, audio_tagging_loss=0.01682, over 25000.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4945338.46 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:06:43,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=186866.66666666666, ans=0.05 2023-12-21 19:07:06,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187000.0, ans=0.1 2023-12-21 19:07:24,969 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.520e-03 2023-12-21 19:07:33,771 INFO [train.py:886] (3/4) Epoch 6, batch 4250, loss[loss=0.02021, audio_tagging_loss=0.02021, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4947780.46 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:07:51,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.13 vs. limit=22.5 2023-12-21 19:08:03,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=15.0 2023-12-21 19:08:08,001 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.574e+01 2.753e+01 2.984e+01 3.993e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 19:08:08,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=187400.0, ans=0.0 2023-12-21 19:08:09,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-21 19:08:14,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187466.66666666666, ans=0.1 2023-12-21 19:08:16,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-21 19:08:17,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=187466.66666666666, ans=0.025 2023-12-21 19:08:22,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-12-21 19:08:24,689 INFO [train.py:886] (3/4) Epoch 6, batch 4300, loss[loss=0.01737, audio_tagging_loss=0.01737, over 25000.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4950730.75 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:08:26,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=187533.33333333334, ans=0.0 2023-12-21 19:08:34,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=187600.0, ans=0.0 2023-12-21 19:09:15,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.32 vs. limit=22.5 2023-12-21 19:09:17,055 INFO [train.py:886] (3/4) Epoch 6, batch 4350, loss[loss=0.01474, audio_tagging_loss=0.01474, over 24750.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4956991.06 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:09:23,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=187866.66666666666, ans=0.125 2023-12-21 19:09:28,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=187933.33333333334, ans=0.125 2023-12-21 19:09:32,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187933.33333333334, ans=0.1 2023-12-21 19:09:45,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=188000.0, ans=0.1 2023-12-21 19:09:51,260 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.707e+01 2.868e+01 3.047e+01 3.925e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-21 19:10:08,771 INFO [train.py:886] (3/4) Epoch 6, batch 4400, loss[loss=0.01682, audio_tagging_loss=0.01682, over 24750.00 frames. ], tot_loss[loss=0.01651, audio_tagging_loss=0.01651, over 4955193.71 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:10:15,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=188200.0, ans=0.05 2023-12-21 19:10:31,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-12-21 19:10:50,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-21 19:11:00,370 INFO [train.py:886] (3/4) Epoch 6, batch 4450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.0165, audio_tagging_loss=0.0165, over 4950242.63 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 64.0 2023-12-21 19:11:05,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=12.0 2023-12-21 19:11:12,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=188600.0, ans=0.2 2023-12-21 19:11:15,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-12-21 19:11:35,071 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.682e+01 2.838e+01 3.055e+01 3.746e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 19:11:35,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=12.0 2023-12-21 19:11:52,466 INFO [train.py:886] (3/4) Epoch 6, batch 4500, loss[loss=0.01651, audio_tagging_loss=0.01651, over 22155.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4947875.19 frames. ], batch size: 107, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:11:52,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=188866.66666666666, ans=0.0 2023-12-21 19:11:57,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=188866.66666666666, ans=0.05 2023-12-21 19:11:58,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2023-12-21 19:12:10,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=188933.33333333334, ans=0.125 2023-12-21 19:12:13,485 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:12:17,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=189000.0, ans=0.0 2023-12-21 19:12:34,079 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.603e+00 2023-12-21 19:12:39,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=189133.33333333334, ans=0.125 2023-12-21 19:12:44,039 INFO [train.py:886] (3/4) Epoch 6, batch 4550, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01616, audio_tagging_loss=0.01616, over 4948062.08 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:12:57,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=189266.66666666666, ans=0.125 2023-12-21 19:12:58,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189266.66666666666, ans=0.1 2023-12-21 19:13:10,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=189333.33333333334, ans=0.0 2023-12-21 19:13:18,807 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.583e+01 2.791e+01 2.970e+01 3.966e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 19:13:20,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=189400.0, ans=0.125 2023-12-21 19:13:36,222 INFO [train.py:886] (3/4) Epoch 6, batch 4600, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4953362.87 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:13:49,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=189600.0, ans=0.2 2023-12-21 19:13:52,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-12-21 19:13:52,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189600.0, ans=0.1 2023-12-21 19:13:54,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=15.0 2023-12-21 19:13:55,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=15.0 2023-12-21 19:14:05,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2023-12-21 19:14:26,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189866.66666666666, ans=0.125 2023-12-21 19:14:27,544 INFO [train.py:886] (3/4) Epoch 6, batch 4650, loss[loss=0.01943, audio_tagging_loss=0.01943, over 25000.00 frames. ], tot_loss[loss=0.01617, audio_tagging_loss=0.01617, over 4952365.58 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:14:44,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-12-21 19:14:48,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190000.0, ans=0.1 2023-12-21 19:15:02,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-12-21 19:15:02,397 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.615e+01 2.807e+01 2.981e+01 3.491e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 19:15:08,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190133.33333333334, ans=0.1 2023-12-21 19:15:18,001 INFO [train.py:886] (3/4) Epoch 6, batch 4700, loss[loss=0.01657, audio_tagging_loss=0.01657, over 24750.00 frames. ], tot_loss[loss=0.01624, audio_tagging_loss=0.01624, over 4952571.94 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:15:18,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=190200.0, ans=0.0 2023-12-21 19:15:30,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=190266.66666666666, ans=0.125 2023-12-21 19:15:36,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.32 vs. limit=22.5 2023-12-21 19:15:52,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=190400.0, ans=0.0 2023-12-21 19:15:53,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=190400.0, ans=0.2 2023-12-21 19:15:56,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=190466.66666666666, ans=0.0 2023-12-21 19:16:00,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-21 19:16:05,837 INFO [train.py:886] (3/4) Epoch 6, batch 4750, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01638, audio_tagging_loss=0.01638, over 4947366.48 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 64.0 2023-12-21 19:16:13,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=190533.33333333334, ans=0.125 2023-12-21 19:16:43,666 INFO [train.py:886] (3/4) Epoch 7, batch 0, loss[loss=0.03758, audio_tagging_loss=0.03758, over 24050.00 frames. ], tot_loss[loss=0.03758, audio_tagging_loss=0.03758, over 24050.00 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 64.0 2023-12-21 19:16:43,666 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 19:17:05,424 INFO [train.py:917] (3/4) Epoch 7, validation: loss=0.03667, audio_tagging_loss=0.03667, over 3737520.00 frames. 2023-12-21 19:17:05,424 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 19:17:17,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=190706.66666666666, ans=0.0 2023-12-21 19:17:23,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2023-12-21 19:17:23,800 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.615e+01 2.821e+01 3.087e+01 1.022e+02, threshold=5.642e+01, percent-clipped=4.0 2023-12-21 19:17:28,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=15.0 2023-12-21 19:17:33,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190773.33333333334, ans=0.125 2023-12-21 19:17:41,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-12-21 19:17:46,120 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.437e+00 2023-12-21 19:17:56,687 INFO [train.py:886] (3/4) Epoch 7, batch 50, loss[loss=0.0222, audio_tagging_loss=0.0222, over 25000.00 frames. ], tot_loss[loss=0.02591, audio_tagging_loss=0.02591, over 1124471.14 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:00,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-21 19:18:01,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.67 vs. limit=22.5 2023-12-21 19:18:03,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=190973.33333333334, ans=0.125 2023-12-21 19:18:13,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=15.0 2023-12-21 19:18:29,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=191173.33333333334, ans=0.0 2023-12-21 19:18:47,573 INFO [train.py:886] (3/4) Epoch 7, batch 100, loss[loss=0.01708, audio_tagging_loss=0.01708, over 25000.00 frames. ], tot_loss[loss=0.02203, audio_tagging_loss=0.02203, over 1979154.23 frames. ], batch size: 100, lr: 1.61e-02, grad_scale: 32.0 2023-12-21 19:18:56,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191306.66666666666, ans=0.1 2023-12-21 19:19:05,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191373.33333333334, ans=0.1 2023-12-21 19:19:05,816 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.945e+01 3.158e+01 3.404e+01 4.637e+01, threshold=6.317e+01, percent-clipped=0.0 2023-12-21 19:19:18,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=191506.66666666666, ans=0.2 2023-12-21 19:19:21,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=191506.66666666666, ans=0.0 2023-12-21 19:19:22,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-21 19:19:35,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=191573.33333333334, ans=0.125 2023-12-21 19:19:38,898 INFO [train.py:886] (3/4) Epoch 7, batch 150, loss[loss=0.01678, audio_tagging_loss=0.01678, over 25000.00 frames. ], tot_loss[loss=0.01993, audio_tagging_loss=0.01993, over 2645665.81 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:19:40,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-12-21 19:19:58,228 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.625e-02 2023-12-21 19:19:58,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=191773.33333333334, ans=0.07 2023-12-21 19:20:02,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-12-21 19:20:27,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=191906.66666666666, ans=0.2 2023-12-21 19:20:29,284 INFO [train.py:886] (3/4) Epoch 7, batch 200, loss[loss=0.01695, audio_tagging_loss=0.01695, over 25000.00 frames. ], tot_loss[loss=0.01872, audio_tagging_loss=0.01872, over 3159420.42 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:20:34,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=191973.33333333334, ans=0.05 2023-12-21 19:20:44,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=192040.0, ans=0.2 2023-12-21 19:20:48,041 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:20:48,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.567e+01 2.755e+01 2.935e+01 3.522e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:21:02,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192173.33333333334, ans=0.1 2023-12-21 19:21:11,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=192240.0, ans=0.1 2023-12-21 19:21:16,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=192240.0, ans=0.0 2023-12-21 19:21:22,176 INFO [train.py:886] (3/4) Epoch 7, batch 250, loss[loss=0.01414, audio_tagging_loss=0.01414, over 22321.00 frames. ], tot_loss[loss=0.01808, audio_tagging_loss=0.01808, over 3560826.18 frames. ], batch size: 107, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:21:35,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=192373.33333333334, ans=0.2 2023-12-21 19:21:38,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=192373.33333333334, ans=0.125 2023-12-21 19:21:59,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-21 19:22:00,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=192506.66666666666, ans=0.1 2023-12-21 19:22:05,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=192573.33333333334, ans=0.2 2023-12-21 19:22:13,476 INFO [train.py:886] (3/4) Epoch 7, batch 300, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01762, audio_tagging_loss=0.01762, over 3866235.68 frames. ], batch size: 99, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:22:30,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=192706.66666666666, ans=0.0 2023-12-21 19:22:31,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.537e+01 2.670e+01 2.875e+01 3.479e+01, threshold=5.340e+01, percent-clipped=0.0 2023-12-21 19:22:31,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=192706.66666666666, ans=0.125 2023-12-21 19:22:48,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=192840.0, ans=0.125 2023-12-21 19:23:04,693 INFO [train.py:886] (3/4) Epoch 7, batch 350, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.01731, audio_tagging_loss=0.01731, over 4097730.57 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:23:07,690 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:23:08,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192973.33333333334, ans=0.1 2023-12-21 19:23:08,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=15.0 2023-12-21 19:23:20,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=193040.0, ans=0.2 2023-12-21 19:23:56,033 INFO [train.py:886] (3/4) Epoch 7, batch 400, loss[loss=0.0172, audio_tagging_loss=0.0172, over 25000.00 frames. ], tot_loss[loss=0.01695, audio_tagging_loss=0.01695, over 4284953.24 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:09,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193373.33333333334, ans=0.1 2023-12-21 19:24:15,296 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.543e+01 2.742e+01 2.935e+01 3.819e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 19:24:24,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=193440.0, ans=0.125 2023-12-21 19:24:28,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.32 vs. limit=22.5 2023-12-21 19:24:33,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=193506.66666666666, ans=0.0 2023-12-21 19:24:34,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=193506.66666666666, ans=0.0 2023-12-21 19:24:42,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=193573.33333333334, ans=0.025 2023-12-21 19:24:48,529 INFO [train.py:886] (3/4) Epoch 7, batch 450, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 4428220.96 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:24:48,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=193640.0, ans=0.2 2023-12-21 19:24:50,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=193640.0, ans=0.0 2023-12-21 19:24:54,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=193640.0, ans=0.2 2023-12-21 19:24:57,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-21 19:24:59,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=193706.66666666666, ans=0.125 2023-12-21 19:25:12,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=193773.33333333334, ans=0.0 2023-12-21 19:25:18,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=193840.0, ans=0.0 2023-12-21 19:25:25,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=193840.0, ans=0.125 2023-12-21 19:25:26,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=193840.0, ans=0.1 2023-12-21 19:25:40,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=193973.33333333334, ans=0.125 2023-12-21 19:25:40,739 INFO [train.py:886] (3/4) Epoch 7, batch 500, loss[loss=0.01749, audio_tagging_loss=0.01749, over 25000.00 frames. ], tot_loss[loss=0.01634, audio_tagging_loss=0.01634, over 4544704.56 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0 2023-12-21 19:25:41,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=193973.33333333334, ans=0.04949747468305833 2023-12-21 19:25:57,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=194040.0, ans=0.0 2023-12-21 19:25:58,610 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.495e+01 2.691e+01 2.855e+01 3.742e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 19:26:02,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194106.66666666666, ans=0.1 2023-12-21 19:26:13,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194173.33333333334, ans=0.1 2023-12-21 19:26:13,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-12-21 19:26:27,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194240.0, ans=0.1 2023-12-21 19:26:30,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.64 vs. limit=10.0 2023-12-21 19:26:31,528 INFO [train.py:886] (3/4) Epoch 7, batch 550, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4638753.83 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:26:37,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=194306.66666666666, ans=0.125 2023-12-21 19:26:43,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=194373.33333333334, ans=0.0 2023-12-21 19:26:55,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=194440.0, ans=0.125 2023-12-21 19:27:01,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=194506.66666666666, ans=0.0 2023-12-21 19:27:11,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194506.66666666666, ans=0.1 2023-12-21 19:27:23,606 INFO [train.py:886] (3/4) Epoch 7, batch 600, loss[loss=0.01533, audio_tagging_loss=0.01533, over 23974.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4711266.95 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:27:23,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=194640.0, ans=0.0 2023-12-21 19:27:42,301 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.620e+01 2.779e+01 2.985e+01 3.932e+01, threshold=5.559e+01, percent-clipped=0.0 2023-12-21 19:27:44,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=194773.33333333334, ans=0.2 2023-12-21 19:27:46,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=194773.33333333334, ans=0.0 2023-12-21 19:28:14,658 INFO [train.py:886] (3/4) Epoch 7, batch 650, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01637, audio_tagging_loss=0.01637, over 4760656.08 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:28:48,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=31.65 vs. limit=22.5 2023-12-21 19:28:51,057 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.753e-03 2023-12-21 19:29:00,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=195240.0, ans=0.125 2023-12-21 19:29:05,861 INFO [train.py:886] (3/4) Epoch 7, batch 700, loss[loss=0.01524, audio_tagging_loss=0.01524, over 24750.00 frames. ], tot_loss[loss=0.01629, audio_tagging_loss=0.01629, over 4798361.59 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:29:10,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=195306.66666666666, ans=0.5 2023-12-21 19:29:24,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.528e+01 2.672e+01 2.888e+01 3.469e+01, threshold=5.344e+01, percent-clipped=0.0 2023-12-21 19:29:36,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195506.66666666666, ans=0.125 2023-12-21 19:29:39,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-12-21 19:29:56,832 INFO [train.py:886] (3/4) Epoch 7, batch 750, loss[loss=0.01643, audio_tagging_loss=0.01643, over 22385.00 frames. ], tot_loss[loss=0.01628, audio_tagging_loss=0.01628, over 4826970.71 frames. ], batch size: 107, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:30:10,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-12-21 19:30:22,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=195773.33333333334, ans=0.125 2023-12-21 19:30:27,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=195840.0, ans=0.0 2023-12-21 19:30:32,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=195840.0, ans=0.2 2023-12-21 19:30:46,829 INFO [train.py:886] (3/4) Epoch 7, batch 800, loss[loss=0.01596, audio_tagging_loss=0.01596, over 24044.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 4861871.68 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:30:58,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=196040.0, ans=0.125 2023-12-21 19:31:02,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=196040.0, ans=0.125 2023-12-21 19:31:05,898 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.570e+01 2.779e+01 3.006e+01 3.610e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-21 19:31:07,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=196106.66666666666, ans=0.0 2023-12-21 19:31:12,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=196106.66666666666, ans=0.0 2023-12-21 19:31:19,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-21 19:31:36,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=196240.0, ans=0.035 2023-12-21 19:31:39,187 INFO [train.py:886] (3/4) Epoch 7, batch 850, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4881051.04 frames. ], batch size: 100, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:31:40,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196306.66666666666, ans=0.0 2023-12-21 19:31:49,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2023-12-21 19:31:55,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=196373.33333333334, ans=0.0 2023-12-21 19:31:58,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=196373.33333333334, ans=0.0 2023-12-21 19:31:58,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=196373.33333333334, ans=0.125 2023-12-21 19:32:03,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2023-12-21 19:32:06,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=196440.0, ans=0.0 2023-12-21 19:32:23,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196573.33333333334, ans=0.1 2023-12-21 19:32:31,636 INFO [train.py:886] (3/4) Epoch 7, batch 900, loss[loss=0.01601, audio_tagging_loss=0.01601, over 24750.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 4897282.41 frames. ], batch size: 99, lr: 1.59e-02, grad_scale: 32.0 2023-12-21 19:32:32,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=196640.0, ans=0.125 2023-12-21 19:32:46,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196706.66666666666, ans=0.1 2023-12-21 19:32:50,058 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.564e+01 2.733e+01 2.886e+01 3.706e+01, threshold=5.467e+01, percent-clipped=0.0 2023-12-21 19:32:55,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=196773.33333333334, ans=0.125 2023-12-21 19:33:01,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-12-21 19:33:13,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2023-12-21 19:33:22,390 INFO [train.py:886] (3/4) Epoch 7, batch 950, loss[loss=0.01566, audio_tagging_loss=0.01566, over 24750.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4910192.86 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:33:43,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197106.66666666666, ans=0.1 2023-12-21 19:34:14,445 INFO [train.py:886] (3/4) Epoch 7, batch 1000, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01608, audio_tagging_loss=0.01608, over 4905064.41 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:34:15,589 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.441e+00 2023-12-21 19:34:24,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-21 19:34:32,233 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.519e+01 2.705e+01 2.941e+01 3.391e+01, threshold=5.409e+01, percent-clipped=0.0 2023-12-21 19:34:32,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=197373.33333333334, ans=0.125 2023-12-21 19:34:50,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=197506.66666666666, ans=0.125 2023-12-21 19:35:01,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197573.33333333334, ans=0.1 2023-12-21 19:35:05,194 INFO [train.py:886] (3/4) Epoch 7, batch 1050, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4918640.80 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:35:24,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=12.0 2023-12-21 19:35:27,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=197773.33333333334, ans=0.125 2023-12-21 19:35:28,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-12-21 19:35:38,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=197840.0, ans=0.125 2023-12-21 19:35:57,691 INFO [train.py:886] (3/4) Epoch 7, batch 1100, loss[loss=0.01849, audio_tagging_loss=0.01849, over 24750.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4924694.17 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:36:11,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198040.0, ans=0.125 2023-12-21 19:36:12,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198040.0, ans=0.1 2023-12-21 19:36:14,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=198040.0, ans=0.0 2023-12-21 19:36:15,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=198040.0, ans=0.125 2023-12-21 19:36:16,005 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.563e+01 2.707e+01 2.877e+01 3.432e+01, threshold=5.414e+01, percent-clipped=0.0 2023-12-21 19:36:23,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=198106.66666666666, ans=0.125 2023-12-21 19:36:28,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2023-12-21 19:36:34,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=198173.33333333334, ans=0.125 2023-12-21 19:36:39,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=198240.0, ans=0.125 2023-12-21 19:36:44,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=198240.0, ans=0.0 2023-12-21 19:36:46,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=198240.0, ans=0.2 2023-12-21 19:36:49,299 INFO [train.py:886] (3/4) Epoch 7, batch 1150, loss[loss=0.01799, audio_tagging_loss=0.01799, over 25000.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4935498.71 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:36:58,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=198373.33333333334, ans=0.0 2023-12-21 19:36:58,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-12-21 19:37:11,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=198440.0, ans=0.0 2023-12-21 19:37:20,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=198506.66666666666, ans=0.125 2023-12-21 19:37:36,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=198573.33333333334, ans=0.125 2023-12-21 19:37:39,460 INFO [train.py:886] (3/4) Epoch 7, batch 1200, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01602, audio_tagging_loss=0.01602, over 4944265.47 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:37:41,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=198640.0, ans=0.025 2023-12-21 19:37:58,026 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.536e+01 2.723e+01 2.860e+01 3.472e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-21 19:38:10,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=198840.0, ans=0.0 2023-12-21 19:38:16,850 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:38:27,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198906.66666666666, ans=0.125 2023-12-21 19:38:31,182 INFO [train.py:886] (3/4) Epoch 7, batch 1250, loss[loss=0.0162, audio_tagging_loss=0.0162, over 24750.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4947671.03 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:38:40,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=198973.33333333334, ans=0.0 2023-12-21 19:39:18,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=199240.0, ans=0.125 2023-12-21 19:39:22,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=199240.0, ans=0.125 2023-12-21 19:39:22,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2023-12-21 19:39:23,913 INFO [train.py:886] (3/4) Epoch 7, batch 1300, loss[loss=0.01711, audio_tagging_loss=0.01711, over 24750.00 frames. ], tot_loss[loss=0.0163, audio_tagging_loss=0.0163, over 4940823.42 frames. ], batch size: 99, lr: 1.58e-02, grad_scale: 32.0 2023-12-21 19:39:42,181 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.626e+01 2.798e+01 3.036e+01 3.776e+01, threshold=5.596e+01, percent-clipped=0.0 2023-12-21 19:39:48,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2023-12-21 19:39:49,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=199440.0, ans=0.5 2023-12-21 19:39:49,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-12-21 19:40:09,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=199573.33333333334, ans=0.0 2023-12-21 19:40:09,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=199573.33333333334, ans=0.0 2023-12-21 19:40:15,017 INFO [train.py:886] (3/4) Epoch 7, batch 1350, loss[loss=0.01702, audio_tagging_loss=0.01702, over 24750.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4945079.84 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:40:20,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=199640.0, ans=0.125 2023-12-21 19:40:35,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-21 19:40:53,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-21 19:41:06,913 INFO [train.py:886] (3/4) Epoch 7, batch 1400, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01605, audio_tagging_loss=0.01605, over 4944674.86 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:41:11,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=199973.33333333334, ans=0.125 2023-12-21 19:41:12,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=199973.33333333334, ans=0.0 2023-12-21 19:41:21,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-12-21 19:41:25,721 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.540e+01 2.768e+01 3.021e+01 3.899e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 19:41:26,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-12-21 19:41:41,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=200173.33333333334, ans=0.1 2023-12-21 19:41:52,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200240.0, ans=0.125 2023-12-21 19:41:58,863 INFO [train.py:886] (3/4) Epoch 7, batch 1450, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.01611, audio_tagging_loss=0.01611, over 4947399.51 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:42:07,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-12-21 19:42:28,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=200506.66666666666, ans=0.04949747468305833 2023-12-21 19:42:44,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=200573.33333333334, ans=0.0 2023-12-21 19:42:44,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=200573.33333333334, ans=0.125 2023-12-21 19:42:48,878 INFO [train.py:886] (3/4) Epoch 7, batch 1500, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 4951632.10 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:43:06,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=200706.66666666666, ans=0.0 2023-12-21 19:43:07,817 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.525e+01 2.780e+01 2.987e+01 4.498e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 19:43:13,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2023-12-21 19:43:14,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=200773.33333333334, ans=0.1 2023-12-21 19:43:29,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-12-21 19:43:40,012 INFO [train.py:886] (3/4) Epoch 7, batch 1550, loss[loss=0.016, audio_tagging_loss=0.016, over 24750.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4948915.53 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:43:59,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=201106.66666666666, ans=0.035 2023-12-21 19:44:12,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=201173.33333333334, ans=0.04949747468305833 2023-12-21 19:44:20,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=201240.0, ans=0.125 2023-12-21 19:44:29,919 INFO [train.py:886] (3/4) Epoch 7, batch 1600, loss[loss=0.014, audio_tagging_loss=0.014, over 24750.00 frames. ], tot_loss[loss=0.01626, audio_tagging_loss=0.01626, over 4947633.71 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:44:45,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201373.33333333334, ans=0.1 2023-12-21 19:44:46,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=201373.33333333334, ans=0.125 2023-12-21 19:44:49,143 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.627e+01 2.765e+01 2.991e+01 3.550e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 19:44:50,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=201440.0, ans=0.0 2023-12-21 19:45:04,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=201506.66666666666, ans=0.125 2023-12-21 19:45:04,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.35 vs. limit=22.5 2023-12-21 19:45:10,433 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 19:45:15,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.96 vs. limit=22.5 2023-12-21 19:45:20,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2023-12-21 19:45:21,527 INFO [train.py:886] (3/4) Epoch 7, batch 1650, loss[loss=0.01691, audio_tagging_loss=0.01691, over 24750.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 4949673.36 frames. ], batch size: 99, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:45:29,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=201640.0, ans=0.125 2023-12-21 19:45:35,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-21 19:45:49,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=201773.33333333334, ans=0.125 2023-12-21 19:45:50,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-12-21 19:45:58,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201840.0, ans=0.1 2023-12-21 19:46:01,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=201906.66666666666, ans=0.125 2023-12-21 19:46:04,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=201906.66666666666, ans=0.2 2023-12-21 19:46:12,791 INFO [train.py:886] (3/4) Epoch 7, batch 1700, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 4955766.97 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0 2023-12-21 19:46:30,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.547e+01 2.700e+01 2.895e+01 3.451e+01, threshold=5.401e+01, percent-clipped=0.0 2023-12-21 19:46:31,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=202106.66666666666, ans=0.0 2023-12-21 19:46:39,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=202106.66666666666, ans=0.07 2023-12-21 19:46:54,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-12-21 19:47:03,149 INFO [train.py:886] (3/4) Epoch 7, batch 1750, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01612, audio_tagging_loss=0.01612, over 4958739.40 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:47:04,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=202306.66666666666, ans=0.125 2023-12-21 19:47:32,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=202506.66666666666, ans=0.0 2023-12-21 19:47:39,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=202506.66666666666, ans=0.0 2023-12-21 19:47:53,951 INFO [train.py:886] (3/4) Epoch 7, batch 1800, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4957642.75 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:47:54,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=202640.0, ans=0.125 2023-12-21 19:48:03,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=202706.66666666666, ans=0.125 2023-12-21 19:48:08,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=202706.66666666666, ans=0.125 2023-12-21 19:48:11,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=202706.66666666666, ans=0.125 2023-12-21 19:48:12,069 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.560e+01 2.744e+01 2.950e+01 3.454e+01, threshold=5.487e+01, percent-clipped=0.0 2023-12-21 19:48:25,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=202840.0, ans=0.0 2023-12-21 19:48:27,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=202840.0, ans=0.0 2023-12-21 19:48:45,172 INFO [train.py:886] (3/4) Epoch 7, batch 1850, loss[loss=0.01896, audio_tagging_loss=0.01896, over 24750.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4957487.13 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:48:46,340 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.092e-01 2023-12-21 19:48:47,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=202973.33333333334, ans=10.0 2023-12-21 19:48:48,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202973.33333333334, ans=0.125 2023-12-21 19:49:02,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-21 19:49:16,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=203173.33333333334, ans=0.0 2023-12-21 19:49:17,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2023-12-21 19:49:18,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=203173.33333333334, ans=0.1 2023-12-21 19:49:20,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203173.33333333334, ans=0.125 2023-12-21 19:49:37,309 INFO [train.py:886] (3/4) Epoch 7, batch 1900, loss[loss=0.01444, audio_tagging_loss=0.01444, over 22008.00 frames. ], tot_loss[loss=0.01625, audio_tagging_loss=0.01625, over 4943634.09 frames. ], batch size: 107, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:49:56,099 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.649e+01 2.834e+01 2.981e+01 3.501e+01, threshold=5.668e+01, percent-clipped=0.0 2023-12-21 19:50:06,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203440.0, ans=0.1 2023-12-21 19:50:11,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=203506.66666666666, ans=0.05 2023-12-21 19:50:19,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=203573.33333333334, ans=0.2 2023-12-21 19:50:27,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-12-21 19:50:29,004 INFO [train.py:886] (3/4) Epoch 7, batch 1950, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01627, audio_tagging_loss=0.01627, over 4945706.37 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:50:34,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=203640.0, ans=0.0 2023-12-21 19:50:35,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.83 vs. limit=10.0 2023-12-21 19:50:37,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203640.0, ans=0.1 2023-12-21 19:50:38,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=203706.66666666666, ans=0.0 2023-12-21 19:51:03,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=12.0 2023-12-21 19:51:11,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=203906.66666666666, ans=0.125 2023-12-21 19:51:18,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=203906.66666666666, ans=0.2 2023-12-21 19:51:20,723 INFO [train.py:886] (3/4) Epoch 7, batch 2000, loss[loss=0.01698, audio_tagging_loss=0.01698, over 24750.00 frames. ], tot_loss[loss=0.0161, audio_tagging_loss=0.0161, over 4948259.65 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 32.0 2023-12-21 19:51:40,083 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.631e+01 2.743e+01 2.994e+01 3.542e+01, threshold=5.486e+01, percent-clipped=0.0 2023-12-21 19:51:47,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204106.66666666666, ans=0.125 2023-12-21 19:51:49,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-12-21 19:51:54,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-12-21 19:52:12,970 INFO [train.py:886] (3/4) Epoch 7, batch 2050, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01596, audio_tagging_loss=0.01596, over 4953467.90 frames. ], batch size: 100, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:52:15,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204306.66666666666, ans=0.1 2023-12-21 19:52:15,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204306.66666666666, ans=0.1 2023-12-21 19:52:46,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204506.66666666666, ans=0.1 2023-12-21 19:52:49,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=204506.66666666666, ans=0.0 2023-12-21 19:52:58,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=204573.33333333334, ans=0.025 2023-12-21 19:53:00,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=204573.33333333334, ans=0.0 2023-12-21 19:53:03,695 INFO [train.py:886] (3/4) Epoch 7, batch 2100, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01597, audio_tagging_loss=0.01597, over 4951976.32 frames. ], batch size: 99, lr: 1.56e-02, grad_scale: 64.0 2023-12-21 19:53:23,306 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.557e+01 2.714e+01 2.947e+01 3.593e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 19:53:29,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204773.33333333334, ans=0.0 2023-12-21 19:53:45,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=204906.66666666666, ans=0.0 2023-12-21 19:53:53,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=204906.66666666666, ans=0.2 2023-12-21 19:53:56,455 INFO [train.py:886] (3/4) Epoch 7, batch 2150, loss[loss=0.01688, audio_tagging_loss=0.01688, over 24750.00 frames. ], tot_loss[loss=0.01597, audio_tagging_loss=0.01597, over 4949173.76 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:53:57,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-12-21 19:54:05,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=205040.0, ans=0.0 2023-12-21 19:54:14,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=205040.0, ans=0.0 2023-12-21 19:54:30,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-21 19:54:38,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=205240.0, ans=0.0 2023-12-21 19:54:39,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=205240.0, ans=0.07 2023-12-21 19:54:47,851 INFO [train.py:886] (3/4) Epoch 7, batch 2200, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4941996.66 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:54:50,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=205306.66666666666, ans=0.09899494936611666 2023-12-21 19:54:52,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205306.66666666666, ans=0.1 2023-12-21 19:55:04,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-21 19:55:06,280 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.584e+01 2.750e+01 3.020e+01 3.487e+01, threshold=5.500e+01, percent-clipped=0.0 2023-12-21 19:55:14,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=205440.0, ans=0.0 2023-12-21 19:55:20,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=205506.66666666666, ans=0.04949747468305833 2023-12-21 19:55:29,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-12-21 19:55:37,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=205640.0, ans=0.0 2023-12-21 19:55:38,661 INFO [train.py:886] (3/4) Epoch 7, batch 2250, loss[loss=0.01574, audio_tagging_loss=0.01574, over 23950.00 frames. ], tot_loss[loss=0.01607, audio_tagging_loss=0.01607, over 4942104.15 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:56:11,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205840.0, ans=0.1 2023-12-21 19:56:13,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=205840.0, ans=0.0 2023-12-21 19:56:29,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=205973.33333333334, ans=0.125 2023-12-21 19:56:29,776 INFO [train.py:886] (3/4) Epoch 7, batch 2300, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4943747.73 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:56:30,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=205973.33333333334, ans=0.0 2023-12-21 19:56:48,195 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.578e+01 2.756e+01 2.990e+01 3.667e+01, threshold=5.511e+01, percent-clipped=0.0 2023-12-21 19:56:50,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=206106.66666666666, ans=0.125 2023-12-21 19:56:56,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=206106.66666666666, ans=0.125 2023-12-21 19:57:09,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=206173.33333333334, ans=0.2 2023-12-21 19:57:21,956 INFO [train.py:886] (3/4) Epoch 7, batch 2350, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01584, audio_tagging_loss=0.01584, over 4948982.43 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:57:31,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2023-12-21 19:57:38,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=206373.33333333334, ans=0.125 2023-12-21 19:57:55,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-21 19:58:13,409 INFO [train.py:886] (3/4) Epoch 7, batch 2400, loss[loss=0.0162, audio_tagging_loss=0.0162, over 24750.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4955957.99 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:58:20,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206640.0, ans=0.1 2023-12-21 19:58:31,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.547e+01 2.719e+01 2.913e+01 3.717e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-21 19:58:36,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=206773.33333333334, ans=0.125 2023-12-21 19:58:39,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=206773.33333333334, ans=0.125 2023-12-21 19:58:39,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=206773.33333333334, ans=0.0 2023-12-21 19:58:49,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=206840.0, ans=0.02 2023-12-21 19:58:51,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=206840.0, ans=10.0 2023-12-21 19:59:05,238 INFO [train.py:886] (3/4) Epoch 7, batch 2450, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4961821.35 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:59:12,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-12-21 19:59:42,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=207173.33333333334, ans=0.125 2023-12-21 19:59:56,864 INFO [train.py:886] (3/4) Epoch 7, batch 2500, loss[loss=0.01636, audio_tagging_loss=0.01636, over 22420.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4960855.46 frames. ], batch size: 107, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 19:59:59,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=207306.66666666666, ans=0.125 2023-12-21 20:00:04,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=207306.66666666666, ans=0.125 2023-12-21 20:00:15,327 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.623e+01 2.777e+01 2.967e+01 3.606e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-21 20:00:20,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=207440.0, ans=0.125 2023-12-21 20:00:33,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=207506.66666666666, ans=0.0 2023-12-21 20:00:36,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=207506.66666666666, ans=0.125 2023-12-21 20:00:48,966 INFO [train.py:886] (3/4) Epoch 7, batch 2550, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01601, audio_tagging_loss=0.01601, over 4953010.89 frames. ], batch size: 99, lr: 1.55e-02, grad_scale: 64.0 2023-12-21 20:01:21,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=207840.0, ans=0.125 2023-12-21 20:01:21,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=207840.0, ans=0.125 2023-12-21 20:01:23,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=207840.0, ans=0.2 2023-12-21 20:01:41,311 INFO [train.py:886] (3/4) Epoch 7, batch 2600, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4953989.75 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:01:47,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=207973.33333333334, ans=0.5 2023-12-21 20:01:57,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=208040.0, ans=15.0 2023-12-21 20:01:59,603 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.546e+01 2.741e+01 2.943e+01 4.018e+01, threshold=5.482e+01, percent-clipped=0.0 2023-12-21 20:02:24,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=208240.0, ans=0.125 2023-12-21 20:02:30,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=208240.0, ans=0.125 2023-12-21 20:02:32,844 INFO [train.py:886] (3/4) Epoch 7, batch 2650, loss[loss=0.01785, audio_tagging_loss=0.01785, over 25000.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4955904.60 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:02:44,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=208373.33333333334, ans=0.0 2023-12-21 20:02:47,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=208373.33333333334, ans=0.0 2023-12-21 20:03:01,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-12-21 20:03:21,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=208573.33333333334, ans=0.0 2023-12-21 20:03:24,907 INFO [train.py:886] (3/4) Epoch 7, batch 2700, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4958432.54 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:03:43,480 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.523e+01 2.665e+01 2.872e+01 3.649e+01, threshold=5.330e+01, percent-clipped=0.0 2023-12-21 20:03:48,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=208773.33333333334, ans=0.1 2023-12-21 20:03:56,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=208840.0, ans=0.125 2023-12-21 20:04:02,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=208840.0, ans=0.2 2023-12-21 20:04:09,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=208906.66666666666, ans=0.2 2023-12-21 20:04:16,736 INFO [train.py:886] (3/4) Epoch 7, batch 2750, loss[loss=0.01823, audio_tagging_loss=0.01823, over 24913.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4962821.34 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:04:27,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=209040.0, ans=15.0 2023-12-21 20:04:28,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-12-21 20:04:53,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2023-12-21 20:05:01,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=209240.0, ans=0.125 2023-12-21 20:05:08,636 INFO [train.py:886] (3/4) Epoch 7, batch 2800, loss[loss=0.01574, audio_tagging_loss=0.01574, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4962704.60 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:05:14,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=209306.66666666666, ans=0.0 2023-12-21 20:05:28,165 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.667e+01 2.782e+01 2.958e+01 3.854e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-21 20:05:34,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=209440.0, ans=0.0 2023-12-21 20:05:43,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209506.66666666666, ans=0.125 2023-12-21 20:05:43,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=209506.66666666666, ans=0.0 2023-12-21 20:05:49,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=209573.33333333334, ans=15.0 2023-12-21 20:06:00,778 INFO [train.py:886] (3/4) Epoch 7, batch 2850, loss[loss=0.0177, audio_tagging_loss=0.0177, over 24750.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4952120.77 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:06:00,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=209640.0, ans=10.0 2023-12-21 20:06:33,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-21 20:06:51,792 INFO [train.py:886] (3/4) Epoch 7, batch 2900, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 4945902.22 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:07:01,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=209973.33333333334, ans=0.09899494936611666 2023-12-21 20:07:05,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=210040.0, ans=0.0 2023-12-21 20:07:10,770 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.581e+01 2.783e+01 3.000e+01 3.656e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 20:07:39,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2023-12-21 20:07:40,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=210240.0, ans=0.2 2023-12-21 20:07:43,855 INFO [train.py:886] (3/4) Epoch 7, batch 2950, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4952181.92 frames. ], batch size: 99, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:07:44,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2023-12-21 20:07:55,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=210373.33333333334, ans=0.125 2023-12-21 20:08:19,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-21 20:08:36,028 INFO [train.py:886] (3/4) Epoch 7, batch 3000, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4953080.39 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 64.0 2023-12-21 20:08:36,029 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 20:08:57,463 INFO [train.py:917] (3/4) Epoch 7, validation: loss=0.03818, audio_tagging_loss=0.03818, over 3737520.00 frames. 2023-12-21 20:08:57,464 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 20:09:00,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.95 vs. limit=15.0 2023-12-21 20:09:07,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-21 20:09:15,950 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.521e+01 2.655e+01 2.830e+01 3.730e+01, threshold=5.311e+01, percent-clipped=0.0 2023-12-21 20:09:26,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=210773.33333333334, ans=0.125 2023-12-21 20:09:27,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=210840.0, ans=0.0 2023-12-21 20:09:28,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=210840.0, ans=0.125 2023-12-21 20:09:36,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-12-21 20:09:43,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=210906.66666666666, ans=0.125 2023-12-21 20:09:46,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-12-21 20:09:48,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=210906.66666666666, ans=0.2 2023-12-21 20:09:48,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=210973.33333333334, ans=0.125 2023-12-21 20:09:49,766 INFO [train.py:886] (3/4) Epoch 7, batch 3050, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4960836.58 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:09:51,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=210973.33333333334, ans=0.125 2023-12-21 20:10:12,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2023-12-21 20:10:16,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=211106.66666666666, ans=0.125 2023-12-21 20:10:16,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=211106.66666666666, ans=10.0 2023-12-21 20:10:18,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=211106.66666666666, ans=0.09899494936611666 2023-12-21 20:10:21,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=211173.33333333334, ans=0.025 2023-12-21 20:10:25,055 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.582e-01 2023-12-21 20:10:42,207 INFO [train.py:886] (3/4) Epoch 7, batch 3100, loss[loss=0.01759, audio_tagging_loss=0.01759, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4963637.95 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:11:00,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.611e+01 2.756e+01 2.922e+01 4.082e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 20:11:33,207 INFO [train.py:886] (3/4) Epoch 7, batch 3150, loss[loss=0.01723, audio_tagging_loss=0.01723, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4962972.05 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:11:42,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=211640.0, ans=0.1 2023-12-21 20:11:45,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-21 20:11:52,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=211706.66666666666, ans=15.0 2023-12-21 20:11:56,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=211773.33333333334, ans=0.0 2023-12-21 20:12:06,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2023-12-21 20:12:08,091 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:12:25,086 INFO [train.py:886] (3/4) Epoch 7, batch 3200, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4958018.02 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:12:25,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=211973.33333333334, ans=0.2 2023-12-21 20:12:28,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2023-12-21 20:12:43,123 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.209e+01 2.609e+01 2.784e+01 2.998e+01 3.894e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 20:12:48,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=212106.66666666666, ans=0.0 2023-12-21 20:12:50,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=212106.66666666666, ans=0.0 2023-12-21 20:13:09,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=212240.0, ans=0.0 2023-12-21 20:13:12,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2023-12-21 20:13:17,064 INFO [train.py:886] (3/4) Epoch 7, batch 3250, loss[loss=0.01724, audio_tagging_loss=0.01724, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4954151.58 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:13:17,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=212306.66666666666, ans=0.125 2023-12-21 20:13:23,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=212306.66666666666, ans=0.0 2023-12-21 20:13:42,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=212440.0, ans=0.0 2023-12-21 20:13:50,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=212506.66666666666, ans=0.125 2023-12-21 20:13:59,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=212573.33333333334, ans=0.125 2023-12-21 20:14:02,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=212573.33333333334, ans=0.125 2023-12-21 20:14:08,615 INFO [train.py:886] (3/4) Epoch 7, batch 3300, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4952820.16 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:14:14,208 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.098e-01 2023-12-21 20:14:14,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=212640.0, ans=0.0 2023-12-21 20:14:27,795 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+01 2.590e+01 2.789e+01 2.998e+01 3.537e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:14:27,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212706.66666666666, ans=0.1 2023-12-21 20:14:30,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.12 vs. limit=22.5 2023-12-21 20:14:30,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=212773.33333333334, ans=0.125 2023-12-21 20:14:36,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=212773.33333333334, ans=0.125 2023-12-21 20:14:39,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212840.0, ans=0.125 2023-12-21 20:14:55,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.10 vs. limit=22.5 2023-12-21 20:14:57,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212906.66666666666, ans=0.1 2023-12-21 20:15:01,340 INFO [train.py:886] (3/4) Epoch 7, batch 3350, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 4946374.95 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:15:05,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=212973.33333333334, ans=0.2 2023-12-21 20:15:31,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-21 20:15:34,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=213173.33333333334, ans=0.125 2023-12-21 20:15:37,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=213173.33333333334, ans=0.0 2023-12-21 20:15:42,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=213240.0, ans=0.0 2023-12-21 20:15:53,143 INFO [train.py:886] (3/4) Epoch 7, batch 3400, loss[loss=0.0163, audio_tagging_loss=0.0163, over 25000.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 4953417.74 frames. ], batch size: 100, lr: 1.53e-02, grad_scale: 64.0 2023-12-21 20:16:05,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=213373.33333333334, ans=0.125 2023-12-21 20:16:05,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-21 20:16:13,539 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.594e+01 2.749e+01 2.971e+01 3.801e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-21 20:16:16,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=213440.0, ans=0.125 2023-12-21 20:16:24,634 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:16:25,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-21 20:16:34,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=15.0 2023-12-21 20:16:46,477 INFO [train.py:886] (3/4) Epoch 7, batch 3450, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4947259.22 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:16:48,590 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.233e-01 2023-12-21 20:16:54,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-21 20:17:15,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=213773.33333333334, ans=0.2 2023-12-21 20:17:25,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=213840.0, ans=0.125 2023-12-21 20:17:28,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2023-12-21 20:17:32,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=213906.66666666666, ans=0.05 2023-12-21 20:17:38,621 INFO [train.py:886] (3/4) Epoch 7, batch 3500, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4945349.71 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:17:46,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=213973.33333333334, ans=0.125 2023-12-21 20:17:56,290 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.629e+01 2.792e+01 2.990e+01 3.562e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 20:17:57,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-21 20:18:00,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=214106.66666666666, ans=10.0 2023-12-21 20:18:08,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=214173.33333333334, ans=0.2 2023-12-21 20:18:12,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=214173.33333333334, ans=0.125 2023-12-21 20:18:15,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=214173.33333333334, ans=0.2 2023-12-21 20:18:15,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=214173.33333333334, ans=0.125 2023-12-21 20:18:18,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=214240.0, ans=0.0 2023-12-21 20:18:29,576 INFO [train.py:886] (3/4) Epoch 7, batch 3550, loss[loss=0.01576, audio_tagging_loss=0.01576, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4944074.33 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:18:29,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.47 vs. limit=22.5 2023-12-21 20:18:33,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214306.66666666666, ans=0.1 2023-12-21 20:19:06,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=214506.66666666666, ans=0.2 2023-12-21 20:19:17,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=214573.33333333334, ans=0.125 2023-12-21 20:19:20,723 INFO [train.py:886] (3/4) Epoch 7, batch 3600, loss[loss=0.01855, audio_tagging_loss=0.01855, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4950009.67 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:19:39,456 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.510e+01 2.669e+01 2.897e+01 3.609e+01, threshold=5.338e+01, percent-clipped=0.0 2023-12-21 20:19:42,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=15.0 2023-12-21 20:19:50,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=214773.33333333334, ans=0.2 2023-12-21 20:19:56,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=214840.0, ans=0.5 2023-12-21 20:19:57,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.43 vs. limit=22.5 2023-12-21 20:20:09,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=214906.66666666666, ans=0.0 2023-12-21 20:20:12,777 INFO [train.py:886] (3/4) Epoch 7, batch 3650, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4955963.68 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:20:19,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=214973.33333333334, ans=0.125 2023-12-21 20:20:20,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=214973.33333333334, ans=0.035 2023-12-21 20:20:24,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-12-21 20:20:33,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=215106.66666666666, ans=0.125 2023-12-21 20:21:03,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215306.66666666666, ans=0.1 2023-12-21 20:21:04,224 INFO [train.py:886] (3/4) Epoch 7, batch 3700, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4952624.07 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:21:07,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=215306.66666666666, ans=0.125 2023-12-21 20:21:10,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=215306.66666666666, ans=0.125 2023-12-21 20:21:10,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215306.66666666666, ans=0.1 2023-12-21 20:21:19,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=215373.33333333334, ans=0.125 2023-12-21 20:21:23,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=215373.33333333334, ans=0.125 2023-12-21 20:21:24,015 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.509e+01 2.714e+01 2.873e+01 3.497e+01, threshold=5.429e+01, percent-clipped=0.0 2023-12-21 20:21:42,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=215506.66666666666, ans=0.125 2023-12-21 20:21:47,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2023-12-21 20:21:56,905 INFO [train.py:886] (3/4) Epoch 7, batch 3750, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4952835.94 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:22:05,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=215706.66666666666, ans=0.125 2023-12-21 20:22:05,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.85 vs. limit=15.0 2023-12-21 20:22:17,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-12-21 20:22:24,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.04 vs. limit=6.0 2023-12-21 20:22:41,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215906.66666666666, ans=0.125 2023-12-21 20:22:47,579 INFO [train.py:886] (3/4) Epoch 7, batch 3800, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4944158.72 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:22:47,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=215973.33333333334, ans=0.0 2023-12-21 20:23:06,291 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.628e+01 2.789e+01 3.025e+01 3.707e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 20:23:22,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-12-21 20:23:29,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.59 vs. limit=15.0 2023-12-21 20:23:33,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216240.0, ans=0.1 2023-12-21 20:23:39,552 INFO [train.py:886] (3/4) Epoch 7, batch 3850, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4948852.76 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 64.0 2023-12-21 20:23:40,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=15.0 2023-12-21 20:23:42,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=216306.66666666666, ans=0.125 2023-12-21 20:23:57,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=216373.33333333334, ans=0.125 2023-12-21 20:24:30,985 INFO [train.py:886] (3/4) Epoch 7, batch 3900, loss[loss=0.01731, audio_tagging_loss=0.01731, over 24750.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4948683.25 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:24:46,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=216706.66666666666, ans=0.2 2023-12-21 20:24:48,636 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.626e+01 2.763e+01 2.975e+01 3.870e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 20:25:18,198 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:25:21,833 INFO [train.py:886] (3/4) Epoch 7, batch 3950, loss[loss=0.02111, audio_tagging_loss=0.02111, over 25000.00 frames. ], tot_loss[loss=0.01591, audio_tagging_loss=0.01591, over 4950644.56 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:25:27,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2023-12-21 20:25:29,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=216973.33333333334, ans=0.2 2023-12-21 20:25:46,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=217106.66666666666, ans=0.0 2023-12-21 20:26:01,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=15.0 2023-12-21 20:26:02,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=217240.0, ans=0.125 2023-12-21 20:26:14,570 INFO [train.py:886] (3/4) Epoch 7, batch 4000, loss[loss=0.01838, audio_tagging_loss=0.01838, over 25000.00 frames. ], tot_loss[loss=0.01583, audio_tagging_loss=0.01583, over 4954123.57 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:26:18,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=217306.66666666666, ans=0.0 2023-12-21 20:26:25,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=217373.33333333334, ans=0.125 2023-12-21 20:26:30,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217373.33333333334, ans=0.1 2023-12-21 20:26:32,674 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.642e+01 2.775e+01 3.007e+01 3.915e+01, threshold=5.549e+01, percent-clipped=0.0 2023-12-21 20:26:42,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=217440.0, ans=0.0 2023-12-21 20:26:43,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=217440.0, ans=0.0 2023-12-21 20:27:05,533 INFO [train.py:886] (3/4) Epoch 7, batch 4050, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4955164.95 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:27:06,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=217640.0, ans=0.07 2023-12-21 20:27:08,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=217640.0, ans=0.125 2023-12-21 20:27:20,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217706.66666666666, ans=0.1 2023-12-21 20:27:22,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=217706.66666666666, ans=0.0 2023-12-21 20:27:22,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217706.66666666666, ans=0.1 2023-12-21 20:27:32,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-21 20:27:34,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=217773.33333333334, ans=0.125 2023-12-21 20:27:42,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=217840.0, ans=0.2 2023-12-21 20:27:47,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=217906.66666666666, ans=0.09899494936611666 2023-12-21 20:27:58,019 INFO [train.py:886] (3/4) Epoch 7, batch 4100, loss[loss=0.01938, audio_tagging_loss=0.01938, over 24750.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4947837.63 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:09,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=12.0 2023-12-21 20:28:18,076 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.585e+01 2.755e+01 2.921e+01 3.340e+01, threshold=5.510e+01, percent-clipped=0.0 2023-12-21 20:28:18,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-12-21 20:28:50,129 INFO [train.py:886] (3/4) Epoch 7, batch 4150, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01588, audio_tagging_loss=0.01588, over 4943595.09 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:28:50,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=15.0 2023-12-21 20:28:50,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-12-21 20:29:00,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=218373.33333333334, ans=0.125 2023-12-21 20:29:13,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=218440.0, ans=0.07 2023-12-21 20:29:14,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=218440.0, ans=0.035 2023-12-21 20:29:34,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218573.33333333334, ans=0.1 2023-12-21 20:29:41,473 INFO [train.py:886] (3/4) Epoch 7, batch 4200, loss[loss=0.01681, audio_tagging_loss=0.01681, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4946878.15 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:29:41,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-12-21 20:29:45,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=218640.0, ans=0.125 2023-12-21 20:30:01,126 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.567e+01 2.744e+01 2.993e+01 3.872e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-21 20:30:01,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=218773.33333333334, ans=0.07 2023-12-21 20:30:08,603 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:30:15,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218840.0, ans=0.0 2023-12-21 20:30:18,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=218840.0, ans=0.5 2023-12-21 20:30:33,455 INFO [train.py:886] (3/4) Epoch 7, batch 4250, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4949098.68 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:30:52,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=12.0 2023-12-21 20:31:05,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-12-21 20:31:25,683 INFO [train.py:886] (3/4) Epoch 7, batch 4300, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4951934.81 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 64.0 2023-12-21 20:31:38,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=219373.33333333334, ans=0.0 2023-12-21 20:31:45,085 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.678e+01 2.884e+01 3.041e+01 3.867e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-21 20:31:51,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=219440.0, ans=0.0 2023-12-21 20:32:00,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=219506.66666666666, ans=0.0 2023-12-21 20:32:03,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219506.66666666666, ans=0.1 2023-12-21 20:32:10,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219573.33333333334, ans=0.1 2023-12-21 20:32:16,686 INFO [train.py:886] (3/4) Epoch 7, batch 4350, loss[loss=0.01636, audio_tagging_loss=0.01636, over 24750.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4951396.31 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:32:16,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=219640.0, ans=0.125 2023-12-21 20:32:37,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=219773.33333333334, ans=0.125 2023-12-21 20:32:48,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=219840.0, ans=0.125 2023-12-21 20:32:54,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219840.0, ans=0.1 2023-12-21 20:32:55,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=219840.0, ans=0.0 2023-12-21 20:32:55,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=219840.0, ans=0.0 2023-12-21 20:33:09,197 INFO [train.py:886] (3/4) Epoch 7, batch 4400, loss[loss=0.01572, audio_tagging_loss=0.01572, over 24106.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4947477.17 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:33:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=219973.33333333334, ans=0.125 2023-12-21 20:33:15,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219973.33333333334, ans=0.1 2023-12-21 20:33:19,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2023-12-21 20:33:24,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.65 vs. limit=22.5 2023-12-21 20:33:28,579 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.625e+01 2.823e+01 3.033e+01 3.448e+01, threshold=5.646e+01, percent-clipped=0.0 2023-12-21 20:33:30,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=220106.66666666666, ans=0.0 2023-12-21 20:33:43,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=220173.33333333334, ans=0.035 2023-12-21 20:33:50,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220240.0, ans=0.1 2023-12-21 20:34:00,661 INFO [train.py:886] (3/4) Epoch 7, batch 4450, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 4943835.39 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:04,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.08 vs. limit=10.0 2023-12-21 20:34:22,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=220440.0, ans=0.125 2023-12-21 20:34:24,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=220440.0, ans=0.2 2023-12-21 20:34:47,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=220573.33333333334, ans=0.035 2023-12-21 20:34:52,055 INFO [train.py:886] (3/4) Epoch 7, batch 4500, loss[loss=0.01515, audio_tagging_loss=0.01515, over 24750.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4941303.78 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:34:53,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-12-21 20:35:03,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=220706.66666666666, ans=0.125 2023-12-21 20:35:12,166 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.651e+01 2.803e+01 3.008e+01 3.714e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-21 20:35:28,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=220840.0, ans=0.05 2023-12-21 20:35:34,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=220906.66666666666, ans=0.125 2023-12-21 20:35:38,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.66 vs. limit=22.5 2023-12-21 20:35:44,490 INFO [train.py:886] (3/4) Epoch 7, batch 4550, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4948443.89 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:36:04,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=15.0 2023-12-21 20:36:06,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=221106.66666666666, ans=0.125 2023-12-21 20:36:24,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=221173.33333333334, ans=0.125 2023-12-21 20:36:28,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=221240.0, ans=0.2 2023-12-21 20:36:36,196 INFO [train.py:886] (3/4) Epoch 7, batch 4600, loss[loss=0.01663, audio_tagging_loss=0.01663, over 25000.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4952671.44 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:36:37,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-21 20:36:44,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=221306.66666666666, ans=0.0 2023-12-21 20:36:45,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=221373.33333333334, ans=0.125 2023-12-21 20:36:56,445 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.617e+01 2.767e+01 2.974e+01 3.555e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 20:37:17,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=221573.33333333334, ans=0.125 2023-12-21 20:37:18,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-21 20:37:27,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=221573.33333333334, ans=0.2 2023-12-21 20:37:28,708 INFO [train.py:886] (3/4) Epoch 7, batch 4650, loss[loss=0.01606, audio_tagging_loss=0.01606, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4955691.64 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:37:54,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=221773.33333333334, ans=0.0 2023-12-21 20:38:02,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.68 vs. limit=15.0 2023-12-21 20:38:04,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-12-21 20:38:09,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221906.66666666666, ans=0.1 2023-12-21 20:38:09,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-21 20:38:11,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=221906.66666666666, ans=0.125 2023-12-21 20:38:12,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221906.66666666666, ans=0.125 2023-12-21 20:38:15,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=221906.66666666666, ans=0.125 2023-12-21 20:38:19,034 INFO [train.py:886] (3/4) Epoch 7, batch 4700, loss[loss=0.0197, audio_tagging_loss=0.0197, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4946839.31 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:38:24,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-12-21 20:38:24,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=221973.33333333334, ans=0.2 2023-12-21 20:38:31,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=222040.0, ans=0.125 2023-12-21 20:38:36,954 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.697e+01 2.843e+01 3.036e+01 3.752e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-21 20:38:46,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=222173.33333333334, ans=0.125 2023-12-21 20:39:06,409 INFO [train.py:886] (3/4) Epoch 7, batch 4750, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4940893.90 frames. ], batch size: 99, lr: 1.50e-02, grad_scale: 64.0 2023-12-21 20:39:11,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222306.66666666666, ans=0.1 2023-12-21 20:39:11,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=222306.66666666666, ans=0.0 2023-12-21 20:39:14,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=222373.33333333334, ans=0.125 2023-12-21 20:39:18,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=222373.33333333334, ans=0.125 2023-12-21 20:39:42,236 INFO [train.py:886] (3/4) Epoch 8, batch 0, loss[loss=0.03737, audio_tagging_loss=0.03737, over 25000.00 frames. ], tot_loss[loss=0.03737, audio_tagging_loss=0.03737, over 25000.00 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 64.0 2023-12-21 20:39:42,237 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 20:40:00,243 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8179, 2.8033, 3.5920, 3.9322], device='cuda:3') 2023-12-21 20:40:00,925 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8110, 2.8105, 3.5980, 3.9685], device='cuda:3') 2023-12-21 20:40:03,497 INFO [train.py:917] (3/4) Epoch 8, validation: loss=0.0357, audio_tagging_loss=0.0357, over 3737520.00 frames. 2023-12-21 20:40:03,498 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 20:40:14,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-12-21 20:40:42,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-21 20:40:44,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=222680.0, ans=0.2 2023-12-21 20:40:55,248 INFO [train.py:886] (3/4) Epoch 8, batch 50, loss[loss=0.02168, audio_tagging_loss=0.02168, over 25000.00 frames. ], tot_loss[loss=0.02504, audio_tagging_loss=0.02504, over 1123355.07 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:40:59,553 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.852e+01 3.338e+01 3.973e+01 1.217e+02, threshold=6.676e+01, percent-clipped=6.0 2023-12-21 20:41:01,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=222746.66666666666, ans=0.0 2023-12-21 20:41:10,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.62 vs. limit=22.5 2023-12-21 20:41:12,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-21 20:41:30,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=222946.66666666666, ans=0.0 2023-12-21 20:41:47,035 INFO [train.py:886] (3/4) Epoch 8, batch 100, loss[loss=0.01796, audio_tagging_loss=0.01796, over 25000.00 frames. ], tot_loss[loss=0.02157, audio_tagging_loss=0.02157, over 1968657.25 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:41:56,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.92 vs. limit=10.0 2023-12-21 20:42:04,787 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.831e+00 2023-12-21 20:42:11,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=223213.33333333334, ans=0.1 2023-12-21 20:42:29,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=223346.66666666666, ans=0.125 2023-12-21 20:42:32,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=223346.66666666666, ans=0.125 2023-12-21 20:42:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223413.33333333334, ans=0.1 2023-12-21 20:42:37,954 INFO [train.py:886] (3/4) Epoch 8, batch 150, loss[loss=0.01857, audio_tagging_loss=0.01857, over 25000.00 frames. ], tot_loss[loss=0.01952, audio_tagging_loss=0.01952, over 2633668.86 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:42:41,687 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.729e+01 2.893e+01 3.075e+01 3.731e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-21 20:42:55,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=223480.0, ans=0.0 2023-12-21 20:42:56,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.77 vs. limit=22.5 2023-12-21 20:42:59,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-12-21 20:43:03,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223546.66666666666, ans=0.125 2023-12-21 20:43:15,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=223613.33333333334, ans=0.125 2023-12-21 20:43:27,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=223680.0, ans=0.125 2023-12-21 20:43:28,864 INFO [train.py:886] (3/4) Epoch 8, batch 200, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01844, audio_tagging_loss=0.01844, over 3144371.45 frames. ], batch size: 100, lr: 1.41e-02, grad_scale: 32.0 2023-12-21 20:44:00,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2023-12-21 20:44:04,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-12-21 20:44:05,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223946.66666666666, ans=0.0 2023-12-21 20:44:05,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-12-21 20:44:10,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224013.33333333334, ans=0.125 2023-12-21 20:44:21,444 INFO [train.py:886] (3/4) Epoch 8, batch 250, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01766, audio_tagging_loss=0.01766, over 3545013.55 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:44:25,226 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.594e+01 2.804e+01 3.008e+01 3.563e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-21 20:44:36,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=224146.66666666666, ans=0.125 2023-12-21 20:44:45,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224213.33333333334, ans=0.125 2023-12-21 20:44:58,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=224280.0, ans=0.2 2023-12-21 20:45:11,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=224413.33333333334, ans=0.125 2023-12-21 20:45:11,813 INFO [train.py:886] (3/4) Epoch 8, batch 300, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.01721, audio_tagging_loss=0.01721, over 3857236.19 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:45:20,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=224413.33333333334, ans=0.1 2023-12-21 20:45:41,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=224546.66666666666, ans=0.0 2023-12-21 20:45:47,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=224613.33333333334, ans=0.2 2023-12-21 20:45:49,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=224613.33333333334, ans=0.125 2023-12-21 20:45:53,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=224680.0, ans=0.125 2023-12-21 20:46:04,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-12-21 20:46:04,444 INFO [train.py:886] (3/4) Epoch 8, batch 350, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 4094176.89 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:46:08,267 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.559e+01 2.704e+01 2.867e+01 3.346e+01, threshold=5.408e+01, percent-clipped=0.0 2023-12-21 20:46:08,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=224746.66666666666, ans=0.0 2023-12-21 20:46:34,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=224880.0, ans=8.0 2023-12-21 20:46:39,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=224946.66666666666, ans=0.0 2023-12-21 20:46:47,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=225013.33333333334, ans=0.125 2023-12-21 20:46:56,023 INFO [train.py:886] (3/4) Epoch 8, batch 400, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01643, audio_tagging_loss=0.01643, over 4280340.28 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:47:16,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=225213.33333333334, ans=0.0 2023-12-21 20:47:19,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=225213.33333333334, ans=0.125 2023-12-21 20:47:34,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=225280.0, ans=0.07 2023-12-21 20:47:48,113 INFO [train.py:886] (3/4) Epoch 8, batch 450, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 4434225.36 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:47:49,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.51 vs. limit=15.0 2023-12-21 20:47:51,843 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.539e+01 2.733e+01 2.950e+01 3.447e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 20:48:07,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-12-21 20:48:12,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=225546.66666666666, ans=0.2 2023-12-21 20:48:12,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=225546.66666666666, ans=0.125 2023-12-21 20:48:13,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=225546.66666666666, ans=0.125 2023-12-21 20:48:19,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=225613.33333333334, ans=0.125 2023-12-21 20:48:40,737 INFO [train.py:886] (3/4) Epoch 8, batch 500, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4551147.71 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:48:45,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=225746.66666666666, ans=0.125 2023-12-21 20:48:45,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=15.0 2023-12-21 20:49:09,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=225880.0, ans=0.2 2023-12-21 20:49:17,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:49:20,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-21 20:49:22,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-21 20:49:30,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=226080.0, ans=0.0 2023-12-21 20:49:31,725 INFO [train.py:886] (3/4) Epoch 8, batch 550, loss[loss=0.01867, audio_tagging_loss=0.01867, over 25000.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4634578.46 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:49:35,528 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.537e+01 2.667e+01 2.842e+01 3.334e+01, threshold=5.333e+01, percent-clipped=0.0 2023-12-21 20:49:36,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226080.0, ans=0.1 2023-12-21 20:49:55,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=226213.33333333334, ans=0.0 2023-12-21 20:49:56,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=226213.33333333334, ans=0.5 2023-12-21 20:50:05,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=226280.0, ans=0.125 2023-12-21 20:50:12,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2023-12-21 20:50:24,161 INFO [train.py:886] (3/4) Epoch 8, batch 600, loss[loss=0.01815, audio_tagging_loss=0.01815, over 24750.00 frames. ], tot_loss[loss=0.01597, audio_tagging_loss=0.01597, over 4700734.66 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:50:28,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226413.33333333334, ans=0.1 2023-12-21 20:50:52,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.64 vs. limit=22.5 2023-12-21 20:51:09,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226680.0, ans=0.0 2023-12-21 20:51:15,689 INFO [train.py:886] (3/4) Epoch 8, batch 650, loss[loss=0.01719, audio_tagging_loss=0.01719, over 24750.00 frames. ], tot_loss[loss=0.01608, audio_tagging_loss=0.01608, over 4744749.43 frames. ], batch size: 99, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:51:20,085 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.706e+01 2.885e+01 3.077e+01 3.988e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-21 20:51:22,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=12.0 2023-12-21 20:51:29,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=226813.33333333334, ans=0.0 2023-12-21 20:51:55,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=226946.66666666666, ans=0.0 2023-12-21 20:52:06,620 INFO [train.py:886] (3/4) Epoch 8, batch 700, loss[loss=0.01686, audio_tagging_loss=0.01686, over 25000.00 frames. ], tot_loss[loss=0.016, audio_tagging_loss=0.016, over 4788159.64 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0 2023-12-21 20:52:09,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=15.0 2023-12-21 20:52:26,540 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.399e-01 2023-12-21 20:52:44,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=227280.0, ans=0.2 2023-12-21 20:52:59,423 INFO [train.py:886] (3/4) Epoch 8, batch 750, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 4818333.73 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:01,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=227413.33333333334, ans=0.125 2023-12-21 20:53:03,144 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.577e+01 2.736e+01 2.943e+01 3.509e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 20:53:22,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-12-21 20:53:31,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-21 20:53:50,468 INFO [train.py:886] (3/4) Epoch 8, batch 800, loss[loss=0.01363, audio_tagging_loss=0.01363, over 21952.00 frames. ], tot_loss[loss=0.01595, audio_tagging_loss=0.01595, over 4850505.42 frames. ], batch size: 107, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:53:51,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2023-12-21 20:53:58,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=227746.66666666666, ans=0.125 2023-12-21 20:54:04,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.69 vs. limit=15.0 2023-12-21 20:54:09,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2023-12-21 20:54:43,036 INFO [train.py:886] (3/4) Epoch 8, batch 850, loss[loss=0.0161, audio_tagging_loss=0.0161, over 25000.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4879743.18 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:54:46,741 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.587e+01 2.731e+01 2.941e+01 3.321e+01, threshold=5.462e+01, percent-clipped=0.0 2023-12-21 20:54:59,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=12.0 2023-12-21 20:55:02,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=228213.33333333334, ans=0.2 2023-12-21 20:55:04,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=228213.33333333334, ans=0.0 2023-12-21 20:55:06,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=228213.33333333334, ans=0.125 2023-12-21 20:55:06,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=228213.33333333334, ans=0.125 2023-12-21 20:55:08,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=228213.33333333334, ans=0.0 2023-12-21 20:55:21,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228280.0, ans=0.1 2023-12-21 20:55:21,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-12-21 20:55:27,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.86 vs. limit=22.5 2023-12-21 20:55:30,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=228346.66666666666, ans=0.125 2023-12-21 20:55:32,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228346.66666666666, ans=0.1 2023-12-21 20:55:32,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=228346.66666666666, ans=0.125 2023-12-21 20:55:34,594 INFO [train.py:886] (3/4) Epoch 8, batch 900, loss[loss=0.01906, audio_tagging_loss=0.01906, over 24941.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4894298.54 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:55:49,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=228480.0, ans=0.0 2023-12-21 20:55:53,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=228480.0, ans=0.125 2023-12-21 20:56:04,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=228546.66666666666, ans=0.125 2023-12-21 20:56:04,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=228546.66666666666, ans=0.125 2023-12-21 20:56:10,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2023-12-21 20:56:26,479 INFO [train.py:886] (3/4) Epoch 8, batch 950, loss[loss=0.0168, audio_tagging_loss=0.0168, over 25000.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 4904525.65 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:56:29,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=228746.66666666666, ans=0.125 2023-12-21 20:56:30,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.642e+01 2.776e+01 2.984e+01 3.977e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-21 20:56:45,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=228813.33333333334, ans=0.125 2023-12-21 20:56:45,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=228813.33333333334, ans=0.0 2023-12-21 20:56:53,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-12-21 20:56:59,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=228946.66666666666, ans=0.2 2023-12-21 20:57:16,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-12-21 20:57:16,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229013.33333333334, ans=0.125 2023-12-21 20:57:17,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=229080.0, ans=0.0 2023-12-21 20:57:18,749 INFO [train.py:886] (3/4) Epoch 8, batch 1000, loss[loss=0.01629, audio_tagging_loss=0.01629, over 25000.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 4907793.74 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:57:21,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.49 vs. limit=22.5 2023-12-21 20:57:24,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.95 vs. limit=22.5 2023-12-21 20:57:27,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=229080.0, ans=0.0 2023-12-21 20:57:50,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=229280.0, ans=0.0 2023-12-21 20:57:51,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-12-21 20:57:55,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=229280.0, ans=0.2 2023-12-21 20:57:59,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=229346.66666666666, ans=0.125 2023-12-21 20:58:06,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-12-21 20:58:10,990 INFO [train.py:886] (3/4) Epoch 8, batch 1050, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4916088.27 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:58:11,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.78 vs. limit=10.0 2023-12-21 20:58:14,762 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.565e+01 2.750e+01 2.956e+01 3.659e+01, threshold=5.501e+01, percent-clipped=0.0 2023-12-21 20:58:24,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-12-21 20:58:26,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-12-21 20:58:31,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=229546.66666666666, ans=0.025 2023-12-21 20:58:42,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=229613.33333333334, ans=0.2 2023-12-21 20:58:47,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=229613.33333333334, ans=0.5 2023-12-21 20:58:53,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=229680.0, ans=0.125 2023-12-21 20:59:02,592 INFO [train.py:886] (3/4) Epoch 8, batch 1100, loss[loss=0.01632, audio_tagging_loss=0.01632, over 25000.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4924300.65 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:03,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.67 vs. limit=10.0 2023-12-21 20:59:06,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=229746.66666666666, ans=0.125 2023-12-21 20:59:09,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=229746.66666666666, ans=22.5 2023-12-21 20:59:12,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=229813.33333333334, ans=0.0 2023-12-21 20:59:16,612 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-21 20:59:17,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2023-12-21 20:59:22,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=229880.0, ans=0.125 2023-12-21 20:59:27,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=229880.0, ans=0.0 2023-12-21 20:59:38,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=229946.66666666666, ans=0.0 2023-12-21 20:59:48,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=230013.33333333334, ans=0.125 2023-12-21 20:59:48,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=230013.33333333334, ans=0.0 2023-12-21 20:59:54,254 INFO [train.py:886] (3/4) Epoch 8, batch 1150, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4938404.68 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 20:59:58,773 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.593e+01 2.758e+01 2.932e+01 3.936e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-21 20:59:59,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=230080.0, ans=0.125 2023-12-21 21:00:03,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=230146.66666666666, ans=0.2 2023-12-21 21:00:23,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230213.33333333334, ans=0.1 2023-12-21 21:00:46,102 INFO [train.py:886] (3/4) Epoch 8, batch 1200, loss[loss=0.01888, audio_tagging_loss=0.01888, over 24750.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 4939818.21 frames. ], batch size: 99, lr: 1.39e-02, grad_scale: 32.0 2023-12-21 21:01:02,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230480.0, ans=0.125 2023-12-21 21:01:06,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=230546.66666666666, ans=0.2 2023-12-21 21:01:25,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=230613.33333333334, ans=0.05 2023-12-21 21:01:38,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2023-12-21 21:01:38,501 INFO [train.py:886] (3/4) Epoch 8, batch 1250, loss[loss=0.01624, audio_tagging_loss=0.01624, over 24750.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4943308.87 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:01:42,236 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.655e+01 2.780e+01 2.996e+01 3.618e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:01:45,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=230746.66666666666, ans=0.0 2023-12-21 21:02:04,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=230880.0, ans=0.2 2023-12-21 21:02:23,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=231013.33333333334, ans=0.125 2023-12-21 21:02:24,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=231013.33333333334, ans=0.125 2023-12-21 21:02:30,626 INFO [train.py:886] (3/4) Epoch 8, batch 1300, loss[loss=0.01884, audio_tagging_loss=0.01884, over 24750.00 frames. ], tot_loss[loss=0.01604, audio_tagging_loss=0.01604, over 4940324.39 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:02:32,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=231080.0, ans=0.125 2023-12-21 21:02:46,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=231146.66666666666, ans=0.0 2023-12-21 21:03:00,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.41 vs. limit=15.0 2023-12-21 21:03:08,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2023-12-21 21:03:20,601 INFO [train.py:886] (3/4) Epoch 8, batch 1350, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01587, audio_tagging_loss=0.01587, over 4944264.26 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:03:25,733 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.556e+01 2.752e+01 2.967e+01 4.307e+01, threshold=5.505e+01, percent-clipped=0.0 2023-12-21 21:03:46,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=231546.66666666666, ans=0.1 2023-12-21 21:03:48,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231546.66666666666, ans=0.1 2023-12-21 21:04:05,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=231680.0, ans=0.125 2023-12-21 21:04:13,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.22 vs. limit=22.5 2023-12-21 21:04:13,935 INFO [train.py:886] (3/4) Epoch 8, batch 1400, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4944740.53 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:04:17,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=231746.66666666666, ans=0.0 2023-12-21 21:04:19,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=231746.66666666666, ans=0.2 2023-12-21 21:04:20,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=15.0 2023-12-21 21:04:23,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=231813.33333333334, ans=0.2 2023-12-21 21:04:26,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=231813.33333333334, ans=0.0 2023-12-21 21:05:05,546 INFO [train.py:886] (3/4) Epoch 8, batch 1450, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4947132.31 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:05:09,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-12-21 21:05:09,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.540e+01 2.719e+01 2.872e+01 3.689e+01, threshold=5.439e+01, percent-clipped=0.0 2023-12-21 21:05:10,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=232080.0, ans=0.0 2023-12-21 21:05:10,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=232080.0, ans=0.125 2023-12-21 21:05:11,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2023-12-21 21:05:14,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=232080.0, ans=0.0 2023-12-21 21:05:33,597 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:05:42,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232280.0, ans=0.125 2023-12-21 21:05:57,191 INFO [train.py:886] (3/4) Epoch 8, batch 1500, loss[loss=0.01767, audio_tagging_loss=0.01767, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4950668.58 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:05:59,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=232413.33333333334, ans=0.2 2023-12-21 21:06:02,120 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:06:03,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=232413.33333333334, ans=0.125 2023-12-21 21:06:34,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=232613.33333333334, ans=0.125 2023-12-21 21:06:48,538 INFO [train.py:886] (3/4) Epoch 8, batch 1550, loss[loss=0.02216, audio_tagging_loss=0.02216, over 24935.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4947103.60 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:06:48,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=232746.66666666666, ans=0.0 2023-12-21 21:06:52,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.641e+01 2.774e+01 2.948e+01 3.544e+01, threshold=5.547e+01, percent-clipped=0.0 2023-12-21 21:07:18,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=232946.66666666666, ans=0.0 2023-12-21 21:07:24,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=232946.66666666666, ans=0.0 2023-12-21 21:07:28,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.38 vs. limit=22.5 2023-12-21 21:07:30,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2023-12-21 21:07:32,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=233013.33333333334, ans=0.0 2023-12-21 21:07:39,861 INFO [train.py:886] (3/4) Epoch 8, batch 1600, loss[loss=0.01485, audio_tagging_loss=0.01485, over 24750.00 frames. ], tot_loss[loss=0.01578, audio_tagging_loss=0.01578, over 4939034.71 frames. ], batch size: 99, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:07:45,594 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.163e+00 2023-12-21 21:07:48,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=233080.0, ans=0.0 2023-12-21 21:08:04,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2023-12-21 21:08:08,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233213.33333333334, ans=0.1 2023-12-21 21:08:17,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233280.0, ans=0.1 2023-12-21 21:08:24,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=233346.66666666666, ans=0.125 2023-12-21 21:08:32,017 INFO [train.py:886] (3/4) Epoch 8, batch 1650, loss[loss=0.01687, audio_tagging_loss=0.01687, over 25000.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4937062.71 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:08:35,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=233413.33333333334, ans=0.125 2023-12-21 21:08:35,746 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.707e+01 2.843e+01 2.988e+01 3.739e+01, threshold=5.687e+01, percent-clipped=0.0 2023-12-21 21:08:36,951 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.669e-02 2023-12-21 21:08:36,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=233413.33333333334, ans=0.2 2023-12-21 21:08:45,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233480.0, ans=0.125 2023-12-21 21:08:48,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=233480.0, ans=0.2 2023-12-21 21:08:56,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2023-12-21 21:09:06,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.57 vs. limit=22.5 2023-12-21 21:09:09,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=233613.33333333334, ans=0.04949747468305833 2023-12-21 21:09:16,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233680.0, ans=0.125 2023-12-21 21:09:23,663 INFO [train.py:886] (3/4) Epoch 8, batch 1700, loss[loss=0.01738, audio_tagging_loss=0.01738, over 25000.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4941363.27 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:09:35,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2023-12-21 21:09:50,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=233880.0, ans=0.0 2023-12-21 21:09:50,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=233880.0, ans=0.125 2023-12-21 21:09:55,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233946.66666666666, ans=0.1 2023-12-21 21:09:59,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233946.66666666666, ans=0.1 2023-12-21 21:10:08,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=234013.33333333334, ans=0.125 2023-12-21 21:10:09,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=234013.33333333334, ans=0.07 2023-12-21 21:10:10,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=234013.33333333334, ans=0.125 2023-12-21 21:10:15,210 INFO [train.py:886] (3/4) Epoch 8, batch 1750, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4949535.59 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0 2023-12-21 21:10:19,034 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.526e+01 2.688e+01 2.921e+01 3.740e+01, threshold=5.376e+01, percent-clipped=0.0 2023-12-21 21:11:08,758 INFO [train.py:886] (3/4) Epoch 8, batch 1800, loss[loss=0.01612, audio_tagging_loss=0.01612, over 22011.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4953827.35 frames. ], batch size: 107, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:11:20,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=234480.0, ans=15.0 2023-12-21 21:11:37,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=234546.66666666666, ans=0.0 2023-12-21 21:11:45,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-21 21:11:48,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=12.0 2023-12-21 21:11:49,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=234680.0, ans=0.2 2023-12-21 21:11:59,300 INFO [train.py:886] (3/4) Epoch 8, batch 1850, loss[loss=0.01204, audio_tagging_loss=0.01204, over 21331.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4949145.11 frames. ], batch size: 107, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:11:59,508 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.235e+00 2023-12-21 21:12:03,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2023-12-21 21:12:04,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.642e+01 2.796e+01 3.049e+01 4.216e+01, threshold=5.592e+01, percent-clipped=0.0 2023-12-21 21:12:19,622 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:12:26,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2023-12-21 21:12:31,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=234946.66666666666, ans=0.125 2023-12-21 21:12:32,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=234946.66666666666, ans=10.0 2023-12-21 21:12:35,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=234946.66666666666, ans=0.0 2023-12-21 21:12:40,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=15.0 2023-12-21 21:12:40,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-21 21:12:50,825 INFO [train.py:886] (3/4) Epoch 8, batch 1900, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4944304.71 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:12:58,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=235080.0, ans=0.0 2023-12-21 21:13:00,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-12-21 21:13:29,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2023-12-21 21:13:42,945 INFO [train.py:886] (3/4) Epoch 8, batch 1950, loss[loss=0.01814, audio_tagging_loss=0.01814, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4942129.22 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:13:43,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=235413.33333333334, ans=0.2 2023-12-21 21:13:44,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-12-21 21:13:47,358 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.599e+01 2.763e+01 2.892e+01 3.548e+01, threshold=5.526e+01, percent-clipped=0.0 2023-12-21 21:13:58,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235480.0, ans=0.0 2023-12-21 21:13:59,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235480.0, ans=0.125 2023-12-21 21:14:00,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=235480.0, ans=0.125 2023-12-21 21:14:15,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-12-21 21:14:25,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=235680.0, ans=0.125 2023-12-21 21:14:31,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=235680.0, ans=0.125 2023-12-21 21:14:34,282 INFO [train.py:886] (3/4) Epoch 8, batch 2000, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01569, audio_tagging_loss=0.01569, over 4948449.93 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 32.0 2023-12-21 21:14:48,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235813.33333333334, ans=0.1 2023-12-21 21:14:49,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-12-21 21:14:53,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=235813.33333333334, ans=0.125 2023-12-21 21:15:02,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=235880.0, ans=0.125 2023-12-21 21:15:05,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-12-21 21:15:16,941 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.523e-01 2023-12-21 21:15:26,862 INFO [train.py:886] (3/4) Epoch 8, batch 2050, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4953208.57 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:15:31,344 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.542e+01 2.684e+01 2.827e+01 3.551e+01, threshold=5.367e+01, percent-clipped=0.0 2023-12-21 21:15:50,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=236213.33333333334, ans=0.125 2023-12-21 21:16:03,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.67 vs. limit=10.0 2023-12-21 21:16:05,680 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.386e-01 2023-12-21 21:16:06,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-21 21:16:06,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=236280.0, ans=0.0 2023-12-21 21:16:07,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=236346.66666666666, ans=0.125 2023-12-21 21:16:16,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-12-21 21:16:18,466 INFO [train.py:886] (3/4) Epoch 8, batch 2100, loss[loss=0.01621, audio_tagging_loss=0.01621, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4958982.73 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:16:29,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=236480.0, ans=0.125 2023-12-21 21:16:46,274 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.744e-01 2023-12-21 21:16:49,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=236613.33333333334, ans=0.0 2023-12-21 21:16:50,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-12-21 21:16:56,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236613.33333333334, ans=0.1 2023-12-21 21:17:10,431 INFO [train.py:886] (3/4) Epoch 8, batch 2150, loss[loss=0.01575, audio_tagging_loss=0.01575, over 24750.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4962137.67 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:17:14,091 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.652e+01 2.733e+01 2.907e+01 3.375e+01, threshold=5.466e+01, percent-clipped=0.0 2023-12-21 21:17:41,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=236946.66666666666, ans=0.0 2023-12-21 21:17:43,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=236946.66666666666, ans=0.0 2023-12-21 21:17:58,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=237013.33333333334, ans=0.125 2023-12-21 21:18:01,944 INFO [train.py:886] (3/4) Epoch 8, batch 2200, loss[loss=0.01846, audio_tagging_loss=0.01846, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4956474.73 frames. ], batch size: 99, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:09,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=237080.0, ans=0.125 2023-12-21 21:18:16,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=237146.66666666666, ans=0.125 2023-12-21 21:18:53,991 INFO [train.py:886] (3/4) Epoch 8, batch 2250, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4951993.93 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:18:57,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-12-21 21:18:59,106 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.643e+01 2.757e+01 2.915e+01 3.398e+01, threshold=5.513e+01, percent-clipped=0.0 2023-12-21 21:19:07,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=237480.0, ans=15.0 2023-12-21 21:19:26,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=237613.33333333334, ans=0.2 2023-12-21 21:19:42,483 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:19:44,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=237680.0, ans=0.0 2023-12-21 21:19:45,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-21 21:19:46,178 INFO [train.py:886] (3/4) Epoch 8, batch 2300, loss[loss=0.01944, audio_tagging_loss=0.01944, over 22132.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4948795.39 frames. ], batch size: 107, lr: 1.37e-02, grad_scale: 64.0 2023-12-21 21:19:50,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=237746.66666666666, ans=0.0 2023-12-21 21:20:10,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=237880.0, ans=0.09899494936611666 2023-12-21 21:20:16,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=237946.66666666666, ans=0.0 2023-12-21 21:20:23,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-12-21 21:20:38,799 INFO [train.py:886] (3/4) Epoch 8, batch 2350, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4947434.70 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:20:40,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238080.0, ans=0.125 2023-12-21 21:20:42,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-12-21 21:20:42,604 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.503e+01 2.677e+01 2.839e+01 3.914e+01, threshold=5.353e+01, percent-clipped=0.0 2023-12-21 21:20:53,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=238146.66666666666, ans=0.125 2023-12-21 21:21:09,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=238280.0, ans=0.125 2023-12-21 21:21:19,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=238346.66666666666, ans=0.2 2023-12-21 21:21:29,900 INFO [train.py:886] (3/4) Epoch 8, batch 2400, loss[loss=0.01702, audio_tagging_loss=0.01702, over 25000.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4954476.68 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:21:42,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=238480.0, ans=0.125 2023-12-21 21:21:44,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-21 21:21:57,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=15.0 2023-12-21 21:22:15,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=238680.0, ans=0.125 2023-12-21 21:22:22,518 INFO [train.py:886] (3/4) Epoch 8, batch 2450, loss[loss=0.01654, audio_tagging_loss=0.01654, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4954093.07 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:22:26,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.604e+01 2.785e+01 2.949e+01 3.842e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 21:22:36,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=238813.33333333334, ans=0.2 2023-12-21 21:23:14,636 INFO [train.py:886] (3/4) Epoch 8, batch 2500, loss[loss=0.01448, audio_tagging_loss=0.01448, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4953821.52 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:23:30,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=15.0 2023-12-21 21:23:38,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=239213.33333333334, ans=0.09899494936611666 2023-12-21 21:23:39,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239213.33333333334, ans=0.125 2023-12-21 21:23:45,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=239280.0, ans=0.2 2023-12-21 21:23:50,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=239280.0, ans=0.07 2023-12-21 21:24:06,253 INFO [train.py:886] (3/4) Epoch 8, batch 2550, loss[loss=0.01287, audio_tagging_loss=0.01287, over 24750.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 4944642.93 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:24:07,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=239413.33333333334, ans=15.0 2023-12-21 21:24:09,935 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.703e+01 2.852e+01 3.044e+01 3.567e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-21 21:24:35,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=239546.66666666666, ans=0.0 2023-12-21 21:24:55,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-21 21:24:58,483 INFO [train.py:886] (3/4) Epoch 8, batch 2600, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4946634.36 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:14,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=239813.33333333334, ans=0.125 2023-12-21 21:25:20,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=239880.0, ans=0.125 2023-12-21 21:25:22,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2023-12-21 21:25:29,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.07 vs. limit=5.0 2023-12-21 21:25:46,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=240013.33333333334, ans=15.0 2023-12-21 21:25:50,961 INFO [train.py:886] (3/4) Epoch 8, batch 2650, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4949764.76 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:25:55,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.630e+01 2.789e+01 2.971e+01 3.583e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 21:26:03,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-21 21:26:24,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=240280.0, ans=0.2 2023-12-21 21:26:35,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=240346.66666666666, ans=0.2 2023-12-21 21:26:39,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=240346.66666666666, ans=0.1 2023-12-21 21:26:42,493 INFO [train.py:886] (3/4) Epoch 8, batch 2700, loss[loss=0.01861, audio_tagging_loss=0.01861, over 21811.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4953913.80 frames. ], batch size: 107, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:26:44,493 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:26:53,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-12-21 21:26:56,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=240480.0, ans=0.0 2023-12-21 21:27:09,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=240546.66666666666, ans=0.1 2023-12-21 21:27:25,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=240680.0, ans=0.125 2023-12-21 21:27:33,952 INFO [train.py:886] (3/4) Epoch 8, batch 2750, loss[loss=0.01718, audio_tagging_loss=0.01718, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4956524.02 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:27:37,717 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.574e+01 2.741e+01 2.911e+01 3.788e+01, threshold=5.483e+01, percent-clipped=0.0 2023-12-21 21:27:37,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240746.66666666666, ans=0.1 2023-12-21 21:27:45,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2023-12-21 21:27:47,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240813.33333333334, ans=0.0 2023-12-21 21:27:53,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=240880.0, ans=0.07 2023-12-21 21:28:04,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=240946.66666666666, ans=0.125 2023-12-21 21:28:08,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=240946.66666666666, ans=0.0 2023-12-21 21:28:25,032 INFO [train.py:886] (3/4) Epoch 8, batch 2800, loss[loss=0.01491, audio_tagging_loss=0.01491, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4955599.23 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:28:25,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=241080.0, ans=0.0 2023-12-21 21:28:30,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=241080.0, ans=0.0 2023-12-21 21:28:37,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=241146.66666666666, ans=0.125 2023-12-21 21:28:38,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.10 vs. limit=22.5 2023-12-21 21:28:41,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241146.66666666666, ans=0.125 2023-12-21 21:28:49,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=241213.33333333334, ans=0.125 2023-12-21 21:28:59,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=241280.0, ans=0.0 2023-12-21 21:29:05,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-12-21 21:29:17,689 INFO [train.py:886] (3/4) Epoch 8, batch 2850, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 4954529.45 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 64.0 2023-12-21 21:29:17,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=241413.33333333334, ans=0.125 2023-12-21 21:29:21,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241413.33333333334, ans=0.1 2023-12-21 21:29:22,197 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.608e+01 2.780e+01 2.965e+01 3.474e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-21 21:29:42,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=241546.66666666666, ans=0.125 2023-12-21 21:29:47,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=241546.66666666666, ans=0.125 2023-12-21 21:29:58,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=241680.0, ans=0.125 2023-12-21 21:30:06,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=241680.0, ans=0.0 2023-12-21 21:30:08,460 INFO [train.py:886] (3/4) Epoch 8, batch 2900, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 4951946.89 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:30:21,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=241813.33333333334, ans=0.2 2023-12-21 21:30:24,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241813.33333333334, ans=0.125 2023-12-21 21:30:25,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=241813.33333333334, ans=0.2 2023-12-21 21:30:38,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-12-21 21:30:43,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=241946.66666666666, ans=0.2 2023-12-21 21:31:01,401 INFO [train.py:886] (3/4) Epoch 8, batch 2950, loss[loss=0.01727, audio_tagging_loss=0.01727, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4950131.37 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:02,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=242080.0, ans=0.125 2023-12-21 21:31:05,141 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.582e+01 2.728e+01 2.928e+01 3.566e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-21 21:31:05,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=242080.0, ans=0.0 2023-12-21 21:31:19,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=242146.66666666666, ans=0.125 2023-12-21 21:31:23,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=242213.33333333334, ans=0.0 2023-12-21 21:31:44,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=242346.66666666666, ans=0.04949747468305833 2023-12-21 21:31:53,110 INFO [train.py:886] (3/4) Epoch 8, batch 3000, loss[loss=0.01687, audio_tagging_loss=0.01687, over 24750.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4952389.05 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:31:53,110 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 21:32:14,407 INFO [train.py:917] (3/4) Epoch 8, validation: loss=0.03648, audio_tagging_loss=0.03648, over 3737520.00 frames. 2023-12-21 21:32:14,407 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 21:32:18,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.22 vs. limit=15.0 2023-12-21 21:32:22,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=242413.33333333334, ans=15.0 2023-12-21 21:32:48,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=242613.33333333334, ans=0.125 2023-12-21 21:33:00,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=242680.0, ans=0.125 2023-12-21 21:33:06,488 INFO [train.py:886] (3/4) Epoch 8, batch 3050, loss[loss=0.01622, audio_tagging_loss=0.01622, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4953895.09 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:33:10,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.603e+01 2.768e+01 2.944e+01 3.581e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-21 21:33:19,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=242813.33333333334, ans=0.125 2023-12-21 21:33:48,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=243013.33333333334, ans=15.0 2023-12-21 21:33:55,927 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.971e-01 2023-12-21 21:33:57,603 INFO [train.py:886] (3/4) Epoch 8, batch 3100, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01554, audio_tagging_loss=0.01554, over 4952765.37 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:34:06,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=243080.0, ans=0.0 2023-12-21 21:34:22,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-21 21:34:27,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=243213.33333333334, ans=0.125 2023-12-21 21:34:34,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=243280.0, ans=0.125 2023-12-21 21:34:49,037 INFO [train.py:886] (3/4) Epoch 8, batch 3150, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4947741.32 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:34:52,855 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.832e+01 3.035e+01 3.552e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-21 21:35:00,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=243480.0, ans=0.0 2023-12-21 21:35:05,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243480.0, ans=0.1 2023-12-21 21:35:12,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=243546.66666666666, ans=0.0 2023-12-21 21:35:12,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=243546.66666666666, ans=0.5 2023-12-21 21:35:39,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-12-21 21:35:40,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=243746.66666666666, ans=10.0 2023-12-21 21:35:42,043 INFO [train.py:886] (3/4) Epoch 8, batch 3200, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4943717.11 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:35:44,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=243746.66666666666, ans=0.5 2023-12-21 21:36:02,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=243880.0, ans=0.2 2023-12-21 21:36:33,916 INFO [train.py:886] (3/4) Epoch 8, batch 3250, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4945247.80 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:36:37,713 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.581e+01 2.751e+01 2.965e+01 3.733e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-21 21:36:50,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=244146.66666666666, ans=0.125 2023-12-21 21:37:13,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-21 21:37:21,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=244346.66666666666, ans=0.125 2023-12-21 21:37:25,272 INFO [train.py:886] (3/4) Epoch 8, batch 3300, loss[loss=0.01709, audio_tagging_loss=0.01709, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4950919.50 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:37:25,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=244413.33333333334, ans=0.2 2023-12-21 21:37:44,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2023-12-21 21:37:52,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-21 21:37:59,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2023-12-21 21:38:00,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=244613.33333333334, ans=0.0 2023-12-21 21:38:13,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.23 vs. limit=22.5 2023-12-21 21:38:17,046 INFO [train.py:886] (3/4) Epoch 8, batch 3350, loss[loss=0.01606, audio_tagging_loss=0.01606, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4954792.08 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:38:18,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-12-21 21:38:21,608 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.538e+01 2.708e+01 2.913e+01 3.391e+01, threshold=5.415e+01, percent-clipped=0.0 2023-12-21 21:38:23,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=244746.66666666666, ans=0.0 2023-12-21 21:38:24,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:25,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244746.66666666666, ans=0.125 2023-12-21 21:38:26,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=244813.33333333334, ans=0.125 2023-12-21 21:38:32,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2023-12-21 21:38:41,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=244880.0, ans=0.0 2023-12-21 21:38:48,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244946.66666666666, ans=0.1 2023-12-21 21:38:48,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-12-21 21:38:55,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=244946.66666666666, ans=0.125 2023-12-21 21:39:08,642 INFO [train.py:886] (3/4) Epoch 8, batch 3400, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4948494.82 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 64.0 2023-12-21 21:39:09,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=245080.0, ans=0.125 2023-12-21 21:39:11,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=245080.0, ans=0.125 2023-12-21 21:39:17,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=15.0 2023-12-21 21:39:19,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=245146.66666666666, ans=0.0 2023-12-21 21:39:21,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=245146.66666666666, ans=0.125 2023-12-21 21:39:23,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.85 vs. limit=15.0 2023-12-21 21:39:26,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245146.66666666666, ans=0.0 2023-12-21 21:39:50,158 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.247e+00 2023-12-21 21:40:01,070 INFO [train.py:886] (3/4) Epoch 8, batch 3450, loss[loss=0.01655, audio_tagging_loss=0.01655, over 24750.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4945668.73 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:40:04,818 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.638e+01 2.791e+01 3.014e+01 3.775e+01, threshold=5.582e+01, percent-clipped=0.0 2023-12-21 21:40:05,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=245413.33333333334, ans=0.125 2023-12-21 21:40:09,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=245413.33333333334, ans=0.0 2023-12-21 21:40:24,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.51 vs. limit=15.0 2023-12-21 21:40:27,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.78 vs. limit=22.5 2023-12-21 21:40:29,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-12-21 21:40:32,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=245613.33333333334, ans=0.2 2023-12-21 21:40:36,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=245613.33333333334, ans=0.0 2023-12-21 21:40:46,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=245680.0, ans=0.2 2023-12-21 21:40:53,255 INFO [train.py:886] (3/4) Epoch 8, batch 3500, loss[loss=0.01535, audio_tagging_loss=0.01535, over 24750.00 frames. ], tot_loss[loss=0.01571, audio_tagging_loss=0.01571, over 4941758.43 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:40:54,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=245746.66666666666, ans=0.04949747468305833 2023-12-21 21:40:57,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=245746.66666666666, ans=0.125 2023-12-21 21:40:57,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-12-21 21:41:15,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=245880.0, ans=0.125 2023-12-21 21:41:20,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=245880.0, ans=0.125 2023-12-21 21:41:28,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=245946.66666666666, ans=0.125 2023-12-21 21:41:37,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2023-12-21 21:41:44,049 INFO [train.py:886] (3/4) Epoch 8, batch 3550, loss[loss=0.01544, audio_tagging_loss=0.01544, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4945805.44 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:41:46,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=246080.0, ans=0.125 2023-12-21 21:41:48,506 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.549e+01 2.709e+01 2.899e+01 3.607e+01, threshold=5.417e+01, percent-clipped=0.0 2023-12-21 21:41:56,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=246146.66666666666, ans=0.2 2023-12-21 21:42:05,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2023-12-21 21:42:11,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=246213.33333333334, ans=0.125 2023-12-21 21:42:28,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2023-12-21 21:42:35,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=246346.66666666666, ans=0.07 2023-12-21 21:42:37,279 INFO [train.py:886] (3/4) Epoch 8, batch 3600, loss[loss=0.01333, audio_tagging_loss=0.01333, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4943422.01 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:42:57,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=246546.66666666666, ans=0.07 2023-12-21 21:43:02,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246546.66666666666, ans=0.1 2023-12-21 21:43:11,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=246613.33333333334, ans=0.04949747468305833 2023-12-21 21:43:22,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-12-21 21:43:27,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246680.0, ans=0.125 2023-12-21 21:43:28,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=246746.66666666666, ans=0.025 2023-12-21 21:43:29,309 INFO [train.py:886] (3/4) Epoch 8, batch 3650, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4948978.90 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:43:33,745 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.621e+01 2.795e+01 3.047e+01 3.909e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-21 21:43:53,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=246880.0, ans=0.2 2023-12-21 21:44:01,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=246946.66666666666, ans=0.125 2023-12-21 21:44:20,649 INFO [train.py:886] (3/4) Epoch 8, batch 3700, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01548, audio_tagging_loss=0.01548, over 4956147.31 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:44:20,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247080.0, ans=0.1 2023-12-21 21:44:44,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=247213.33333333334, ans=0.125 2023-12-21 21:44:58,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-12-21 21:45:11,563 INFO [train.py:886] (3/4) Epoch 8, batch 3750, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24082.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4955150.67 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:45:12,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=247413.33333333334, ans=0.125 2023-12-21 21:45:15,998 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.638e+01 2.813e+01 3.001e+01 3.548e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-21 21:45:21,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=247480.0, ans=0.125 2023-12-21 21:45:41,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=247546.66666666666, ans=0.125 2023-12-21 21:45:42,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=247613.33333333334, ans=0.125 2023-12-21 21:46:02,565 INFO [train.py:886] (3/4) Epoch 8, batch 3800, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4950924.70 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:15,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-12-21 21:46:31,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247880.0, ans=0.1 2023-12-21 21:46:32,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=247880.0, ans=0.125 2023-12-21 21:46:45,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=248013.33333333334, ans=0.125 2023-12-21 21:46:45,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=248013.33333333334, ans=0.125 2023-12-21 21:46:52,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=248013.33333333334, ans=0.0 2023-12-21 21:46:55,691 INFO [train.py:886] (3/4) Epoch 8, batch 3850, loss[loss=0.01523, audio_tagging_loss=0.01523, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4945888.22 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:46:55,894 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.263e+00 2023-12-21 21:46:59,389 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.593e+01 2.766e+01 2.914e+01 3.476e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-21 21:47:07,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=248146.66666666666, ans=0.125 2023-12-21 21:47:36,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=248346.66666666666, ans=0.125 2023-12-21 21:47:47,211 INFO [train.py:886] (3/4) Epoch 8, batch 3900, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4946543.89 frames. ], batch size: 99, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:47:48,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=248413.33333333334, ans=0.1 2023-12-21 21:48:07,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=248546.66666666666, ans=0.125 2023-12-21 21:48:07,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=248546.66666666666, ans=0.1 2023-12-21 21:48:08,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=15.0 2023-12-21 21:48:15,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=248546.66666666666, ans=0.125 2023-12-21 21:48:24,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 21:48:32,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=248680.0, ans=0.0 2023-12-21 21:48:38,320 INFO [train.py:886] (3/4) Epoch 8, batch 3950, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4952049.43 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:48:42,026 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.574e+01 2.671e+01 2.865e+01 3.681e+01, threshold=5.342e+01, percent-clipped=0.0 2023-12-21 21:49:12,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248946.66666666666, ans=0.1 2023-12-21 21:49:13,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=248946.66666666666, ans=0.125 2023-12-21 21:49:22,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2023-12-21 21:49:27,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=15.0 2023-12-21 21:49:30,365 INFO [train.py:886] (3/4) Epoch 8, batch 4000, loss[loss=0.01679, audio_tagging_loss=0.01679, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4954402.60 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 64.0 2023-12-21 21:49:49,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.70 vs. limit=10.0 2023-12-21 21:49:52,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2023-12-21 21:50:09,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249280.0, ans=0.0 2023-12-21 21:50:15,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-12-21 21:50:21,744 INFO [train.py:886] (3/4) Epoch 8, batch 4050, loss[loss=0.01637, audio_tagging_loss=0.01637, over 24750.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 4961178.90 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:50:27,054 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.602e+01 2.760e+01 2.948e+01 3.649e+01, threshold=5.519e+01, percent-clipped=0.0 2023-12-21 21:50:36,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=249480.0, ans=0.2 2023-12-21 21:50:37,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249480.0, ans=0.125 2023-12-21 21:50:43,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=249546.66666666666, ans=0.125 2023-12-21 21:50:47,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=249546.66666666666, ans=0.125 2023-12-21 21:50:57,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=249613.33333333334, ans=0.125 2023-12-21 21:50:59,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=249613.33333333334, ans=0.125 2023-12-21 21:51:03,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=249680.0, ans=0.0 2023-12-21 21:51:05,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-21 21:51:13,763 INFO [train.py:886] (3/4) Epoch 8, batch 4100, loss[loss=0.01546, audio_tagging_loss=0.01546, over 24019.00 frames. ], tot_loss[loss=0.0157, audio_tagging_loss=0.0157, over 4953493.78 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:51:53,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.00 vs. limit=6.0 2023-12-21 21:52:05,005 INFO [train.py:886] (3/4) Epoch 8, batch 4150, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4953447.43 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:52:10,484 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+01 2.710e+01 2.835e+01 2.954e+01 3.813e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 21:52:10,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250080.0, ans=0.125 2023-12-21 21:52:21,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=250146.66666666666, ans=0.0 2023-12-21 21:52:36,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=250280.0, ans=0.1 2023-12-21 21:52:56,694 INFO [train.py:886] (3/4) Epoch 8, batch 4200, loss[loss=0.01746, audio_tagging_loss=0.01746, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4956081.84 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:53:08,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250480.0, ans=0.125 2023-12-21 21:53:34,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=250613.33333333334, ans=0.09899494936611666 2023-12-21 21:53:49,345 INFO [train.py:886] (3/4) Epoch 8, batch 4250, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4959786.76 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:53:52,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=250746.66666666666, ans=0.125 2023-12-21 21:53:54,769 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.560e+01 2.714e+01 2.924e+01 4.261e+01, threshold=5.428e+01, percent-clipped=0.0 2023-12-21 21:54:20,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250946.66666666666, ans=0.125 2023-12-21 21:54:41,042 INFO [train.py:886] (3/4) Epoch 8, batch 4300, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4953487.69 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:54:41,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=251080.0, ans=0.0 2023-12-21 21:54:44,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=251080.0, ans=0.125 2023-12-21 21:55:33,249 INFO [train.py:886] (3/4) Epoch 8, batch 4350, loss[loss=0.01752, audio_tagging_loss=0.01752, over 24750.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4954911.92 frames. ], batch size: 99, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:55:33,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=251413.33333333334, ans=0.0 2023-12-21 21:55:34,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=251413.33333333334, ans=0.0 2023-12-21 21:55:37,907 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.680e+01 2.895e+01 3.105e+01 4.359e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-21 21:55:40,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=251413.33333333334, ans=0.0 2023-12-21 21:56:17,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=15.0 2023-12-21 21:56:23,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=251680.0, ans=0.1 2023-12-21 21:56:23,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=251680.0, ans=0.0 2023-12-21 21:56:24,821 INFO [train.py:886] (3/4) Epoch 8, batch 4400, loss[loss=0.01631, audio_tagging_loss=0.01631, over 22153.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4948923.60 frames. ], batch size: 107, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:56:27,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.15 vs. limit=6.0 2023-12-21 21:56:33,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2023-12-21 21:56:54,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-12-21 21:56:56,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=251946.66666666666, ans=0.0 2023-12-21 21:56:58,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251946.66666666666, ans=0.1 2023-12-21 21:57:11,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-12-21 21:57:16,303 INFO [train.py:886] (3/4) Epoch 8, batch 4450, loss[loss=0.01542, audio_tagging_loss=0.01542, over 22478.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 4942450.90 frames. ], batch size: 107, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:57:21,713 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.660e+01 2.808e+01 3.032e+01 4.377e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-21 21:57:32,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-12-21 21:57:46,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-21 21:57:56,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2023-12-21 21:57:58,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=252346.66666666666, ans=0.1 2023-12-21 21:58:08,663 INFO [train.py:886] (3/4) Epoch 8, batch 4500, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4947943.60 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:58:18,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-12-21 21:58:28,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=252546.66666666666, ans=0.125 2023-12-21 21:58:30,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=252546.66666666666, ans=0.125 2023-12-21 21:58:57,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=252680.0, ans=0.025 2023-12-21 21:59:00,799 INFO [train.py:886] (3/4) Epoch 8, batch 4550, loss[loss=0.01811, audio_tagging_loss=0.01811, over 25000.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4948706.87 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 64.0 2023-12-21 21:59:06,127 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.575e+01 2.765e+01 2.949e+01 3.693e+01, threshold=5.529e+01, percent-clipped=0.0 2023-12-21 21:59:06,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.05 vs. limit=22.5 2023-12-21 21:59:24,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=252880.0, ans=0.125 2023-12-21 21:59:35,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=252946.66666666666, ans=0.02 2023-12-21 21:59:36,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252946.66666666666, ans=0.125 2023-12-21 21:59:47,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253013.33333333334, ans=0.125 2023-12-21 21:59:52,613 INFO [train.py:886] (3/4) Epoch 8, batch 4600, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01552, audio_tagging_loss=0.01552, over 4948816.64 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 21:59:55,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253080.0, ans=0.125 2023-12-21 21:59:57,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-12-21 22:00:01,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=253080.0, ans=0.0 2023-12-21 22:00:20,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=253213.33333333334, ans=0.2 2023-12-21 22:00:21,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=12.0 2023-12-21 22:00:24,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=253280.0, ans=0.125 2023-12-21 22:00:28,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.47 vs. limit=15.0 2023-12-21 22:00:34,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2023-12-21 22:00:45,206 INFO [train.py:886] (3/4) Epoch 8, batch 4650, loss[loss=0.01892, audio_tagging_loss=0.01892, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4952767.52 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:00:45,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=253413.33333333334, ans=0.125 2023-12-21 22:00:47,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=253413.33333333334, ans=0.125 2023-12-21 22:00:48,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.98 vs. limit=22.5 2023-12-21 22:00:50,676 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.620e+01 2.747e+01 2.927e+01 3.887e+01, threshold=5.494e+01, percent-clipped=0.0 2023-12-21 22:01:10,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=253546.66666666666, ans=0.125 2023-12-21 22:01:35,718 INFO [train.py:886] (3/4) Epoch 8, batch 4700, loss[loss=0.01752, audio_tagging_loss=0.01752, over 24750.00 frames. ], tot_loss[loss=0.01567, audio_tagging_loss=0.01567, over 4950658.59 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:01:52,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=253813.33333333334, ans=0.0 2023-12-21 22:02:05,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=253946.66666666666, ans=0.0 2023-12-21 22:02:13,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-21 22:02:15,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=254013.33333333334, ans=0.07 2023-12-21 22:02:17,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=254013.33333333334, ans=0.125 2023-12-21 22:02:23,288 INFO [train.py:886] (3/4) Epoch 8, batch 4750, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01579, audio_tagging_loss=0.01579, over 4950570.56 frames. ], batch size: 99, lr: 1.32e-02, grad_scale: 64.0 2023-12-21 22:02:23,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=254080.0, ans=0.125 2023-12-21 22:02:27,780 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.666e+01 2.796e+01 2.993e+01 3.759e+01, threshold=5.593e+01, percent-clipped=0.0 2023-12-21 22:02:29,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=254080.0, ans=0.125 2023-12-21 22:02:32,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-12-21 22:03:00,066 INFO [train.py:886] (3/4) Epoch 9, batch 0, loss[loss=0.03897, audio_tagging_loss=0.03897, over 21318.00 frames. ], tot_loss[loss=0.03897, audio_tagging_loss=0.03897, over 21318.00 frames. ], batch size: 107, lr: 1.25e-02, grad_scale: 64.0 2023-12-21 22:03:00,066 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 22:03:21,364 INFO [train.py:917] (3/4) Epoch 9, validation: loss=0.03498, audio_tagging_loss=0.03498, over 3737520.00 frames. 2023-12-21 22:03:21,365 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 22:03:49,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=254320.0, ans=0.0 2023-12-21 22:04:01,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=254453.33333333334, ans=0.125 2023-12-21 22:04:12,821 INFO [train.py:886] (3/4) Epoch 9, batch 50, loss[loss=0.02334, audio_tagging_loss=0.02334, over 25000.00 frames. ], tot_loss[loss=0.02455, audio_tagging_loss=0.02455, over 1119430.45 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:04:23,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=254586.66666666666, ans=0.2 2023-12-21 22:04:26,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2023-12-21 22:04:32,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2023-12-21 22:04:45,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:04:47,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=254720.0, ans=0.125 2023-12-21 22:04:54,278 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 3.011e+01 3.284e+01 3.905e+01 1.113e+02, threshold=6.568e+01, percent-clipped=8.0 2023-12-21 22:04:56,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=254786.66666666666, ans=0.0 2023-12-21 22:05:04,575 INFO [train.py:886] (3/4) Epoch 9, batch 100, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.02129, audio_tagging_loss=0.02129, over 1967855.12 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:05:13,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-12-21 22:05:24,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254986.66666666666, ans=0.1 2023-12-21 22:05:24,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-21 22:05:32,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=254986.66666666666, ans=0.0 2023-12-21 22:05:35,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=255053.33333333334, ans=0.125 2023-12-21 22:05:36,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=255053.33333333334, ans=0.125 2023-12-21 22:05:39,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.79 vs. limit=22.5 2023-12-21 22:05:42,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=255053.33333333334, ans=0.125 2023-12-21 22:05:55,438 INFO [train.py:886] (3/4) Epoch 9, batch 150, loss[loss=0.01696, audio_tagging_loss=0.01696, over 22097.00 frames. ], tot_loss[loss=0.01931, audio_tagging_loss=0.01931, over 2632517.95 frames. ], batch size: 107, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:13,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=12.0 2023-12-21 22:06:20,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=255320.0, ans=0.95 2023-12-21 22:06:37,349 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.677e+01 2.811e+01 2.976e+01 3.484e+01, threshold=5.622e+01, percent-clipped=0.0 2023-12-21 22:06:47,059 INFO [train.py:886] (3/4) Epoch 9, batch 200, loss[loss=0.01734, audio_tagging_loss=0.01734, over 25000.00 frames. ], tot_loss[loss=0.01814, audio_tagging_loss=0.01814, over 3147913.87 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:06:54,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=255520.0, ans=0.0 2023-12-21 22:06:59,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=255586.66666666666, ans=0.2 2023-12-21 22:06:59,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-12-21 22:07:04,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=255586.66666666666, ans=0.125 2023-12-21 22:07:11,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=255653.33333333334, ans=0.125 2023-12-21 22:07:13,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=255653.33333333334, ans=0.04949747468305833 2023-12-21 22:07:16,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=255653.33333333334, ans=0.125 2023-12-21 22:07:22,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=255720.0, ans=0.2 2023-12-21 22:07:26,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=255720.0, ans=0.125 2023-12-21 22:07:34,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.69 vs. limit=22.5 2023-12-21 22:07:39,171 INFO [train.py:886] (3/4) Epoch 9, batch 250, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.0174, audio_tagging_loss=0.0174, over 3549380.12 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:07:39,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2023-12-21 22:07:49,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=255920.0, ans=0.125 2023-12-21 22:07:53,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2023-12-21 22:08:06,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=255986.66666666666, ans=0.0 2023-12-21 22:08:20,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.572e+01 2.690e+01 2.867e+01 4.305e+01, threshold=5.380e+01, percent-clipped=0.0 2023-12-21 22:08:21,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256120.0, ans=0.125 2023-12-21 22:08:24,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256120.0, ans=0.125 2023-12-21 22:08:30,633 INFO [train.py:886] (3/4) Epoch 9, batch 300, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01696, audio_tagging_loss=0.01696, over 3858009.40 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:08:33,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=256186.66666666666, ans=0.0 2023-12-21 22:08:34,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=256186.66666666666, ans=0.07 2023-12-21 22:08:43,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256253.33333333334, ans=0.1 2023-12-21 22:08:56,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=256320.0, ans=0.125 2023-12-21 22:09:01,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=256386.66666666666, ans=0.125 2023-12-21 22:09:10,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-12-21 22:09:11,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=256386.66666666666, ans=0.025 2023-12-21 22:09:18,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=256453.33333333334, ans=0.2 2023-12-21 22:09:20,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=256453.33333333334, ans=0.0 2023-12-21 22:09:20,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=256453.33333333334, ans=0.125 2023-12-21 22:09:23,826 INFO [train.py:886] (3/4) Epoch 9, batch 350, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24750.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 4091252.21 frames. ], batch size: 99, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:09:39,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=256586.66666666666, ans=0.125 2023-12-21 22:09:50,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-21 22:09:53,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256653.33333333334, ans=0.125 2023-12-21 22:10:04,511 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.569e+01 2.774e+01 2.957e+01 3.605e+01, threshold=5.548e+01, percent-clipped=0.0 2023-12-21 22:10:15,404 INFO [train.py:886] (3/4) Epoch 9, batch 400, loss[loss=0.01842, audio_tagging_loss=0.01842, over 24923.00 frames. ], tot_loss[loss=0.01644, audio_tagging_loss=0.01644, over 4279156.55 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0 2023-12-21 22:10:31,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=256920.0, ans=0.125 2023-12-21 22:10:33,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256920.0, ans=0.1 2023-12-21 22:10:43,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256986.66666666666, ans=0.125 2023-12-21 22:11:01,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=257120.0, ans=0.125 2023-12-21 22:11:07,314 INFO [train.py:886] (3/4) Epoch 9, batch 450, loss[loss=0.0138, audio_tagging_loss=0.0138, over 24750.00 frames. ], tot_loss[loss=0.01613, audio_tagging_loss=0.01613, over 4431806.24 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:11:18,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-12-21 22:11:26,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=257253.33333333334, ans=0.125 2023-12-21 22:11:27,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=257253.33333333334, ans=0.125 2023-12-21 22:11:31,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=257320.0, ans=0.2 2023-12-21 22:11:46,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=257386.66666666666, ans=0.125 2023-12-21 22:11:48,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.608e+01 2.793e+01 2.951e+01 3.727e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 22:11:59,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.23 vs. limit=10.0 2023-12-21 22:12:00,516 INFO [train.py:886] (3/4) Epoch 9, batch 500, loss[loss=0.01802, audio_tagging_loss=0.01802, over 25000.00 frames. ], tot_loss[loss=0.01592, audio_tagging_loss=0.01592, over 4548630.63 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:12:10,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=257586.66666666666, ans=0.125 2023-12-21 22:12:11,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=257586.66666666666, ans=0.125 2023-12-21 22:12:12,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257586.66666666666, ans=0.125 2023-12-21 22:12:18,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257586.66666666666, ans=0.0 2023-12-21 22:12:25,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=257653.33333333334, ans=0.0 2023-12-21 22:12:31,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257720.0, ans=0.0 2023-12-21 22:12:43,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=257786.66666666666, ans=0.125 2023-12-21 22:12:51,068 INFO [train.py:886] (3/4) Epoch 9, batch 550, loss[loss=0.01649, audio_tagging_loss=0.01649, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 4640737.49 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:12:57,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=257853.33333333334, ans=0.2 2023-12-21 22:13:07,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257920.0, ans=0.125 2023-12-21 22:13:09,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257920.0, ans=0.0 2023-12-21 22:13:32,270 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 2.530e+01 2.671e+01 2.882e+01 3.618e+01, threshold=5.343e+01, percent-clipped=0.0 2023-12-21 22:13:40,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.31 vs. limit=22.5 2023-12-21 22:13:43,175 INFO [train.py:886] (3/4) Epoch 9, batch 600, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.0159, audio_tagging_loss=0.0159, over 4702069.97 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:13:44,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=258186.66666666666, ans=0.2 2023-12-21 22:13:45,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=258186.66666666666, ans=0.125 2023-12-21 22:13:53,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=258253.33333333334, ans=0.0 2023-12-21 22:14:16,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=22.5 2023-12-21 22:14:33,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=258453.33333333334, ans=0.125 2023-12-21 22:14:34,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=258520.0, ans=0.0 2023-12-21 22:14:34,961 INFO [train.py:886] (3/4) Epoch 9, batch 650, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.01594, audio_tagging_loss=0.01594, over 4754955.88 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:14:35,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=258520.0, ans=0.125 2023-12-21 22:14:44,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258520.0, ans=0.125 2023-12-21 22:14:53,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258586.66666666666, ans=0.125 2023-12-21 22:14:56,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=258653.33333333334, ans=0.025 2023-12-21 22:15:17,032 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.595e+01 2.763e+01 2.915e+01 3.436e+01, threshold=5.525e+01, percent-clipped=0.0 2023-12-21 22:15:24,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-21 22:15:26,592 INFO [train.py:886] (3/4) Epoch 9, batch 700, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01572, audio_tagging_loss=0.01572, over 4799470.51 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:15:33,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=258853.33333333334, ans=0.2 2023-12-21 22:15:46,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=258920.0, ans=0.125 2023-12-21 22:16:10,344 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.026e-02 2023-12-21 22:16:19,300 INFO [train.py:886] (3/4) Epoch 9, batch 750, loss[loss=0.01359, audio_tagging_loss=0.01359, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4831190.07 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:16:31,597 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.502e-03 2023-12-21 22:16:37,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=259253.33333333334, ans=0.125 2023-12-21 22:16:38,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-12-21 22:16:51,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259386.66666666666, ans=0.1 2023-12-21 22:16:55,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-12-21 22:17:01,050 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.547e+01 2.724e+01 2.898e+01 3.402e+01, threshold=5.448e+01, percent-clipped=0.0 2023-12-21 22:17:04,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=259453.33333333334, ans=0.125 2023-12-21 22:17:05,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2023-12-21 22:17:11,290 INFO [train.py:886] (3/4) Epoch 9, batch 800, loss[loss=0.01338, audio_tagging_loss=0.01338, over 22249.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4856751.23 frames. ], batch size: 107, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:17:30,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=259586.66666666666, ans=0.125 2023-12-21 22:17:37,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.94 vs. limit=22.5 2023-12-21 22:17:48,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=259720.0, ans=0.125 2023-12-21 22:17:48,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-12-21 22:17:59,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-12-21 22:18:03,813 INFO [train.py:886] (3/4) Epoch 9, batch 850, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4875336.45 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:18:24,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259986.66666666666, ans=0.1 2023-12-21 22:18:32,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=15.0 2023-12-21 22:18:45,094 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.660e+01 2.834e+01 3.014e+01 3.541e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-21 22:18:56,045 INFO [train.py:886] (3/4) Epoch 9, batch 900, loss[loss=0.01862, audio_tagging_loss=0.01862, over 24750.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4888571.73 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:19:02,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=260186.66666666666, ans=0.0 2023-12-21 22:19:31,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-21 22:19:42,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=260453.33333333334, ans=0.0 2023-12-21 22:19:42,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=260453.33333333334, ans=0.0 2023-12-21 22:19:45,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-12-21 22:19:48,433 INFO [train.py:886] (3/4) Epoch 9, batch 950, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4902429.53 frames. ], batch size: 99, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:06,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-12-21 22:20:23,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=260720.0, ans=0.125 2023-12-21 22:20:25,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=260720.0, ans=0.125 2023-12-21 22:20:29,897 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.673e+01 2.825e+01 3.002e+01 3.457e+01, threshold=5.650e+01, percent-clipped=0.0 2023-12-21 22:20:35,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2023-12-21 22:20:39,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=260853.33333333334, ans=0.125 2023-12-21 22:20:39,989 INFO [train.py:886] (3/4) Epoch 9, batch 1000, loss[loss=0.01868, audio_tagging_loss=0.01868, over 25000.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4907529.67 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:20:40,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=260853.33333333334, ans=0.05 2023-12-21 22:20:51,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2023-12-21 22:20:53,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=260920.0, ans=0.2 2023-12-21 22:21:14,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=261053.33333333334, ans=0.125 2023-12-21 22:21:32,219 INFO [train.py:886] (3/4) Epoch 9, batch 1050, loss[loss=0.01688, audio_tagging_loss=0.01688, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4914279.50 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0 2023-12-21 22:22:13,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.578e+01 2.720e+01 2.898e+01 3.660e+01, threshold=5.440e+01, percent-clipped=0.0 2023-12-21 22:22:15,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2023-12-21 22:22:23,388 INFO [train.py:886] (3/4) Epoch 9, batch 1100, loss[loss=0.01606, audio_tagging_loss=0.01606, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4918348.64 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:22:39,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=261586.66666666666, ans=0.2 2023-12-21 22:22:44,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261653.33333333334, ans=0.1 2023-12-21 22:22:46,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=261653.33333333334, ans=0.125 2023-12-21 22:22:48,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=261653.33333333334, ans=0.125 2023-12-21 22:22:56,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=261720.0, ans=10.0 2023-12-21 22:23:16,838 INFO [train.py:886] (3/4) Epoch 9, batch 1150, loss[loss=0.0158, audio_tagging_loss=0.0158, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4931450.47 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:23:19,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=261853.33333333334, ans=0.125 2023-12-21 22:23:26,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.54 vs. limit=10.0 2023-12-21 22:23:28,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261920.0, ans=0.1 2023-12-21 22:23:28,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2023-12-21 22:23:41,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2023-12-21 22:23:49,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=262053.33333333334, ans=0.125 2023-12-21 22:23:57,573 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 2.612e+01 2.792e+01 2.985e+01 3.661e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-21 22:23:59,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=262120.0, ans=0.125 2023-12-21 22:24:07,923 INFO [train.py:886] (3/4) Epoch 9, batch 1200, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4941081.76 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:24:08,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=262186.6666666667, ans=0.0 2023-12-21 22:24:14,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.24 vs. limit=22.5 2023-12-21 22:24:15,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=262186.6666666667, ans=0.1 2023-12-21 22:24:16,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262186.6666666667, ans=0.0 2023-12-21 22:24:26,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=262253.3333333333, ans=0.125 2023-12-21 22:24:27,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=262320.0, ans=0.125 2023-12-21 22:24:40,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=262386.6666666667, ans=0.125 2023-12-21 22:24:47,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=262386.6666666667, ans=0.125 2023-12-21 22:24:58,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=262520.0, ans=0.125 2023-12-21 22:24:59,129 INFO [train.py:886] (3/4) Epoch 9, batch 1250, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4938559.86 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:25:03,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2023-12-21 22:25:12,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=262586.6666666667, ans=0.1 2023-12-21 22:25:14,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=262586.6666666667, ans=0.125 2023-12-21 22:25:28,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=262653.3333333333, ans=0.125 2023-12-21 22:25:32,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=262720.0, ans=0.0 2023-12-21 22:25:33,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2023-12-21 22:25:39,893 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.627e+01 2.793e+01 3.069e+01 3.716e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:25:40,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=262786.6666666667, ans=0.0 2023-12-21 22:25:52,095 INFO [train.py:886] (3/4) Epoch 9, batch 1300, loss[loss=0.0164, audio_tagging_loss=0.0164, over 21710.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4932708.65 frames. ], batch size: 107, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:25:56,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-12-21 22:26:07,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262920.0, ans=0.1 2023-12-21 22:26:08,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=262920.0, ans=0.2 2023-12-21 22:26:16,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262986.6666666667, ans=0.125 2023-12-21 22:26:24,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-12-21 22:26:35,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=263120.0, ans=0.0 2023-12-21 22:26:42,324 INFO [train.py:886] (3/4) Epoch 9, batch 1350, loss[loss=0.01608, audio_tagging_loss=0.01608, over 25000.00 frames. ], tot_loss[loss=0.01555, audio_tagging_loss=0.01555, over 4931165.98 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:26:42,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=263186.6666666667, ans=0.125 2023-12-21 22:26:57,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263253.3333333333, ans=0.125 2023-12-21 22:27:07,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=263320.0, ans=0.07 2023-12-21 22:27:18,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=263386.6666666667, ans=0.125 2023-12-21 22:27:25,418 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.597e+01 2.714e+01 2.927e+01 3.624e+01, threshold=5.427e+01, percent-clipped=0.0 2023-12-21 22:27:32,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=263453.3333333333, ans=0.0 2023-12-21 22:27:32,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=263453.3333333333, ans=0.1 2023-12-21 22:27:35,557 INFO [train.py:886] (3/4) Epoch 9, batch 1400, loss[loss=0.01563, audio_tagging_loss=0.01563, over 24750.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4936173.78 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:27:36,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=263520.0, ans=0.0 2023-12-21 22:27:43,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=263520.0, ans=0.07 2023-12-21 22:27:53,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=263586.6666666667, ans=0.125 2023-12-21 22:27:53,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=263586.6666666667, ans=0.09899494936611666 2023-12-21 22:27:56,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=263653.3333333333, ans=0.0 2023-12-21 22:28:00,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=263653.3333333333, ans=22.5 2023-12-21 22:28:11,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=263720.0, ans=0.0 2023-12-21 22:28:13,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263720.0, ans=0.1 2023-12-21 22:28:15,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-12-21 22:28:16,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=15.0 2023-12-21 22:28:26,530 INFO [train.py:886] (3/4) Epoch 9, batch 1450, loss[loss=0.01707, audio_tagging_loss=0.01707, over 21179.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4934687.05 frames. ], batch size: 107, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:28:29,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=263853.3333333333, ans=0.05 2023-12-21 22:29:06,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=264053.3333333333, ans=0.125 2023-12-21 22:29:08,304 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.534e+01 2.742e+01 2.913e+01 3.478e+01, threshold=5.484e+01, percent-clipped=0.0 2023-12-21 22:29:10,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=264120.0, ans=0.0 2023-12-21 22:29:13,257 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.141e+00 2023-12-21 22:29:14,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=264120.0, ans=0.95 2023-12-21 22:29:17,811 INFO [train.py:886] (3/4) Epoch 9, batch 1500, loss[loss=0.01811, audio_tagging_loss=0.01811, over 25000.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4944756.70 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:29:17,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=264186.6666666667, ans=0.125 2023-12-21 22:29:38,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-12-21 22:29:41,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-12-21 22:29:46,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=264320.0, ans=0.0 2023-12-21 22:29:59,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=264453.3333333333, ans=0.0 2023-12-21 22:30:01,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2023-12-21 22:30:02,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=264453.3333333333, ans=10.0 2023-12-21 22:30:10,241 INFO [train.py:886] (3/4) Epoch 9, batch 1550, loss[loss=0.01679, audio_tagging_loss=0.01679, over 25000.00 frames. ], tot_loss[loss=0.01563, audio_tagging_loss=0.01563, over 4946392.38 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:30:16,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-21 22:30:25,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=264586.6666666667, ans=0.125 2023-12-21 22:30:43,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=264720.0, ans=0.2 2023-12-21 22:30:46,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=264720.0, ans=0.125 2023-12-21 22:30:47,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=264720.0, ans=0.125 2023-12-21 22:30:51,442 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 2.715e+01 2.913e+01 3.058e+01 3.707e+01, threshold=5.826e+01, percent-clipped=0.0 2023-12-21 22:31:00,774 INFO [train.py:886] (3/4) Epoch 9, batch 1600, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01562, audio_tagging_loss=0.01562, over 4944502.03 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:31:12,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=264920.0, ans=0.0 2023-12-21 22:31:13,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=264920.0, ans=0.125 2023-12-21 22:31:13,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=264920.0, ans=0.125 2023-12-21 22:31:14,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.72 vs. limit=22.5 2023-12-21 22:31:20,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=264920.0, ans=0.125 2023-12-21 22:31:48,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=265120.0, ans=0.125 2023-12-21 22:31:51,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.20 vs. limit=22.5 2023-12-21 22:31:54,073 INFO [train.py:886] (3/4) Epoch 9, batch 1650, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4942960.74 frames. ], batch size: 99, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:32:08,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.19 vs. limit=22.5 2023-12-21 22:32:18,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2023-12-21 22:32:35,013 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.651e+01 2.793e+01 3.063e+01 3.586e+01, threshold=5.586e+01, percent-clipped=0.0 2023-12-21 22:32:35,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=265453.3333333333, ans=0.125 2023-12-21 22:32:39,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-12-21 22:32:46,500 INFO [train.py:886] (3/4) Epoch 9, batch 1700, loss[loss=0.01426, audio_tagging_loss=0.01426, over 22546.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4941914.14 frames. ], batch size: 107, lr: 1.23e-02, grad_scale: 32.0 2023-12-21 22:32:58,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-12-21 22:33:03,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=265586.6666666667, ans=0.125 2023-12-21 22:33:14,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=265653.3333333333, ans=0.125 2023-12-21 22:33:20,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=265720.0, ans=0.0 2023-12-21 22:33:31,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=265786.6666666667, ans=0.125 2023-12-21 22:33:36,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2023-12-21 22:33:37,681 INFO [train.py:886] (3/4) Epoch 9, batch 1750, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4945149.70 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:33:46,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=265853.3333333333, ans=0.125 2023-12-21 22:33:48,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=265920.0, ans=0.0 2023-12-21 22:33:49,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=265920.0, ans=0.0 2023-12-21 22:33:50,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=265920.0, ans=0.5 2023-12-21 22:33:53,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-12-21 22:33:56,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=265920.0, ans=0.2 2023-12-21 22:34:02,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=265986.6666666667, ans=0.125 2023-12-21 22:34:19,242 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.611e+01 2.763e+01 2.973e+01 3.566e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 22:34:30,204 INFO [train.py:886] (3/4) Epoch 9, batch 1800, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4956663.44 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:34:49,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=266320.0, ans=0.125 2023-12-21 22:34:52,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=266320.0, ans=0.0 2023-12-21 22:34:52,693 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:34:57,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=266320.0, ans=0.0 2023-12-21 22:35:00,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-12-21 22:35:03,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=266386.6666666667, ans=0.0 2023-12-21 22:35:11,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=266453.3333333333, ans=0.125 2023-12-21 22:35:21,925 INFO [train.py:886] (3/4) Epoch 9, batch 1850, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4952354.14 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:35:39,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=266586.6666666667, ans=0.125 2023-12-21 22:35:48,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=266653.3333333333, ans=0.0 2023-12-21 22:35:51,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-12-21 22:35:52,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=266653.3333333333, ans=0.0 2023-12-21 22:36:05,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=266786.6666666667, ans=0.125 2023-12-21 22:36:05,785 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.665e+01 2.807e+01 3.010e+01 3.699e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-21 22:36:16,020 INFO [train.py:886] (3/4) Epoch 9, batch 1900, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4949128.18 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:36:28,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=266920.0, ans=0.5 2023-12-21 22:36:28,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-21 22:36:42,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=266986.6666666667, ans=0.125 2023-12-21 22:36:49,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=267053.3333333333, ans=0.0 2023-12-21 22:36:51,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=267053.3333333333, ans=0.125 2023-12-21 22:37:07,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267186.6666666667, ans=0.1 2023-12-21 22:37:08,003 INFO [train.py:886] (3/4) Epoch 9, batch 1950, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4951426.30 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:37:15,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.48 vs. limit=22.5 2023-12-21 22:37:23,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=267253.3333333333, ans=0.2 2023-12-21 22:37:31,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-21 22:37:47,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 2.613e+01 2.746e+01 2.938e+01 3.371e+01, threshold=5.492e+01, percent-clipped=0.0 2023-12-21 22:37:49,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=267453.3333333333, ans=0.125 2023-12-21 22:37:51,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=267453.3333333333, ans=0.2 2023-12-21 22:37:58,869 INFO [train.py:886] (3/4) Epoch 9, batch 2000, loss[loss=0.01588, audio_tagging_loss=0.01588, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4948282.05 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 32.0 2023-12-21 22:38:04,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.89 vs. limit=22.5 2023-12-21 22:38:13,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=267586.6666666667, ans=0.0 2023-12-21 22:38:20,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=267653.3333333333, ans=0.125 2023-12-21 22:38:22,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-12-21 22:38:27,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=267653.3333333333, ans=0.125 2023-12-21 22:38:31,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=267720.0, ans=0.125 2023-12-21 22:38:42,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-12-21 22:38:46,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=267786.6666666667, ans=0.125 2023-12-21 22:38:50,007 INFO [train.py:886] (3/4) Epoch 9, batch 2050, loss[loss=0.01536, audio_tagging_loss=0.01536, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4951701.91 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:38:56,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=267853.3333333333, ans=0.2 2023-12-21 22:39:01,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=15.0 2023-12-21 22:39:03,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267920.0, ans=0.1 2023-12-21 22:39:07,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-12-21 22:39:11,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267986.6666666667, ans=0.125 2023-12-21 22:39:13,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267986.6666666667, ans=0.125 2023-12-21 22:39:15,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267986.6666666667, ans=0.1 2023-12-21 22:39:21,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-21 22:39:23,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=268053.3333333333, ans=0.0 2023-12-21 22:39:30,916 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.585e+01 2.736e+01 2.898e+01 3.379e+01, threshold=5.472e+01, percent-clipped=0.0 2023-12-21 22:39:37,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=268120.0, ans=0.125 2023-12-21 22:39:41,178 INFO [train.py:886] (3/4) Epoch 9, batch 2100, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4955817.57 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:39:47,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-12-21 22:39:59,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268253.3333333333, ans=0.1 2023-12-21 22:40:08,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2023-12-21 22:40:32,299 INFO [train.py:886] (3/4) Epoch 9, batch 2150, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4955819.23 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:41:01,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=268653.3333333333, ans=0.125 2023-12-21 22:41:14,767 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.638e+01 2.787e+01 2.989e+01 3.388e+01, threshold=5.573e+01, percent-clipped=0.0 2023-12-21 22:41:18,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-21 22:41:25,790 INFO [train.py:886] (3/4) Epoch 9, batch 2200, loss[loss=0.0181, audio_tagging_loss=0.0181, over 24750.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4948240.02 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:41:30,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-12-21 22:41:35,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=268920.0, ans=0.0 2023-12-21 22:41:43,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=268920.0, ans=0.0 2023-12-21 22:41:45,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=268986.6666666667, ans=0.125 2023-12-21 22:41:45,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=268986.6666666667, ans=0.02 2023-12-21 22:41:47,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2023-12-21 22:41:56,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=269053.3333333333, ans=0.5 2023-12-21 22:41:57,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=269053.3333333333, ans=0.0 2023-12-21 22:42:11,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=269120.0, ans=0.0 2023-12-21 22:42:12,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-21 22:42:14,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-21 22:42:16,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=269186.6666666667, ans=0.0 2023-12-21 22:42:17,188 INFO [train.py:886] (3/4) Epoch 9, batch 2250, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4944688.87 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:42:18,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2023-12-21 22:42:20,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269186.6666666667, ans=0.125 2023-12-21 22:42:24,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2023-12-21 22:42:45,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.0 2023-12-21 22:42:58,350 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.625e+01 2.827e+01 2.998e+01 3.600e+01, threshold=5.653e+01, percent-clipped=0.0 2023-12-21 22:43:03,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=269453.3333333333, ans=0.125 2023-12-21 22:43:07,845 INFO [train.py:886] (3/4) Epoch 9, batch 2300, loss[loss=0.01872, audio_tagging_loss=0.01872, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4943135.55 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:43:40,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=269720.0, ans=0.125 2023-12-21 22:44:01,066 INFO [train.py:886] (3/4) Epoch 9, batch 2350, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4947225.19 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:44:02,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=269853.3333333333, ans=0.125 2023-12-21 22:44:14,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-21 22:44:19,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=269986.6666666667, ans=10.0 2023-12-21 22:44:29,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.70 vs. limit=15.0 2023-12-21 22:44:41,220 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.563e+01 2.767e+01 2.942e+01 3.408e+01, threshold=5.535e+01, percent-clipped=0.0 2023-12-21 22:44:50,826 INFO [train.py:886] (3/4) Epoch 9, batch 2400, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4949637.21 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0 2023-12-21 22:45:00,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=270253.3333333333, ans=0.0 2023-12-21 22:45:04,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=270253.3333333333, ans=0.125 2023-12-21 22:45:06,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=270253.3333333333, ans=0.125 2023-12-21 22:45:15,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270320.0, ans=0.1 2023-12-21 22:45:21,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=270386.6666666667, ans=0.0 2023-12-21 22:45:34,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2023-12-21 22:45:36,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270453.3333333333, ans=0.1 2023-12-21 22:45:42,469 INFO [train.py:886] (3/4) Epoch 9, batch 2450, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4948470.34 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:45:52,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=270586.6666666667, ans=0.125 2023-12-21 22:46:01,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=270653.3333333333, ans=0.125 2023-12-21 22:46:01,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2023-12-21 22:46:06,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270653.3333333333, ans=0.125 2023-12-21 22:46:06,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=270653.3333333333, ans=0.125 2023-12-21 22:46:15,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=270720.0, ans=0.125 2023-12-21 22:46:16,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=270720.0, ans=15.0 2023-12-21 22:46:22,743 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 2.728e+01 2.875e+01 2.985e+01 3.809e+01, threshold=5.751e+01, percent-clipped=0.0 2023-12-21 22:46:33,013 INFO [train.py:886] (3/4) Epoch 9, batch 2500, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4946186.18 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:46:50,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=270920.0, ans=0.125 2023-12-21 22:46:53,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=270986.6666666667, ans=0.125 2023-12-21 22:47:00,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=270986.6666666667, ans=0.2 2023-12-21 22:47:16,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2023-12-21 22:47:17,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271120.0, ans=0.1 2023-12-21 22:47:24,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271186.6666666667, ans=0.0 2023-12-21 22:47:25,473 INFO [train.py:886] (3/4) Epoch 9, batch 2550, loss[loss=0.01717, audio_tagging_loss=0.01717, over 25000.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 4944041.88 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:48:07,011 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.663e+01 2.770e+01 2.998e+01 3.753e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-21 22:48:17,922 INFO [train.py:886] (3/4) Epoch 9, batch 2600, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 4943388.43 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:48:20,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271520.0, ans=0.125 2023-12-21 22:48:22,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=271520.0, ans=0.0 2023-12-21 22:48:36,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=271653.3333333333, ans=0.125 2023-12-21 22:48:54,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271720.0, ans=0.125 2023-12-21 22:49:01,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=271786.6666666667, ans=0.125 2023-12-21 22:49:06,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=271786.6666666667, ans=15.0 2023-12-21 22:49:08,999 INFO [train.py:886] (3/4) Epoch 9, batch 2650, loss[loss=0.01695, audio_tagging_loss=0.01695, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4946003.81 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:49:23,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271920.0, ans=0.125 2023-12-21 22:49:25,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=271920.0, ans=0.0 2023-12-21 22:49:35,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=271986.6666666667, ans=0.125 2023-12-21 22:49:39,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=272053.3333333333, ans=0.2 2023-12-21 22:49:51,070 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.584e+01 2.691e+01 2.849e+01 3.428e+01, threshold=5.381e+01, percent-clipped=0.0 2023-12-21 22:49:52,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=272120.0, ans=0.125 2023-12-21 22:50:00,590 INFO [train.py:886] (3/4) Epoch 9, batch 2700, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4952625.02 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:10,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=12.0 2023-12-21 22:50:14,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2023-12-21 22:50:15,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272253.3333333333, ans=0.125 2023-12-21 22:50:32,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-12-21 22:50:34,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=12.0 2023-12-21 22:50:41,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=272453.3333333333, ans=0.0 2023-12-21 22:50:47,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=272453.3333333333, ans=0.0 2023-12-21 22:50:50,703 INFO [train.py:886] (3/4) Epoch 9, batch 2750, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4949905.25 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:50:50,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=272520.0, ans=0.125 2023-12-21 22:51:06,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.64 vs. limit=22.5 2023-12-21 22:51:14,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=272653.3333333333, ans=0.0 2023-12-21 22:51:18,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.69 vs. limit=10.0 2023-12-21 22:51:27,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-12-21 22:51:29,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=272720.0, ans=0.125 2023-12-21 22:51:33,396 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.590e+01 2.730e+01 2.864e+01 3.266e+01, threshold=5.459e+01, percent-clipped=0.0 2023-12-21 22:51:35,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.89 vs. limit=10.0 2023-12-21 22:51:43,034 INFO [train.py:886] (3/4) Epoch 9, batch 2800, loss[loss=0.01306, audio_tagging_loss=0.01306, over 22153.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4944766.71 frames. ], batch size: 107, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:51:44,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=272853.3333333333, ans=0.2 2023-12-21 22:51:50,450 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 22:51:59,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=272920.0, ans=0.0 2023-12-21 22:52:06,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=272986.6666666667, ans=0.0 2023-12-21 22:52:10,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-21 22:52:24,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=273120.0, ans=0.125 2023-12-21 22:52:26,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=273120.0, ans=0.125 2023-12-21 22:52:36,149 INFO [train.py:886] (3/4) Epoch 9, batch 2850, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24209.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 4938429.69 frames. ], batch size: 101, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:52:51,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=273253.3333333333, ans=0.125 2023-12-21 22:52:56,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273320.0, ans=0.125 2023-12-21 22:53:05,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=273320.0, ans=0.125 2023-12-21 22:53:17,484 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.634e+01 2.788e+01 2.936e+01 3.853e+01, threshold=5.577e+01, percent-clipped=0.0 2023-12-21 22:53:21,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-12-21 22:53:23,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=273453.3333333333, ans=0.0 2023-12-21 22:53:25,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=273453.3333333333, ans=0.125 2023-12-21 22:53:27,608 INFO [train.py:886] (3/4) Epoch 9, batch 2900, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4940832.24 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:53:39,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=273586.6666666667, ans=0.125 2023-12-21 22:53:59,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=273720.0, ans=0.2 2023-12-21 22:54:16,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=273786.6666666667, ans=0.125 2023-12-21 22:54:20,014 INFO [train.py:886] (3/4) Epoch 9, batch 2950, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4938345.08 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:54:24,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=273853.3333333333, ans=0.0 2023-12-21 22:54:47,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=273986.6666666667, ans=0.125 2023-12-21 22:54:47,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.41 vs. limit=15.0 2023-12-21 22:54:52,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=274053.3333333333, ans=0.125 2023-12-21 22:55:00,637 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.696e+01 2.843e+01 2.959e+01 3.331e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-21 22:55:12,161 INFO [train.py:886] (3/4) Epoch 9, batch 3000, loss[loss=0.01547, audio_tagging_loss=0.01547, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4937914.19 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:55:12,162 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 22:55:21,738 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9917, 2.5861, 2.7938, 2.0691, 2.1765, 2.5774, 2.5123, 2.4131], device='cuda:3') 2023-12-21 22:55:32,579 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6014, 5.6186, 5.1702, 5.4628], device='cuda:3') 2023-12-21 22:55:33,472 INFO [train.py:917] (3/4) Epoch 9, validation: loss=0.03523, audio_tagging_loss=0.03523, over 3737520.00 frames. 2023-12-21 22:55:33,472 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 22:55:40,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=274186.6666666667, ans=0.125 2023-12-21 22:55:54,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=274320.0, ans=0.2 2023-12-21 22:56:11,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=274386.6666666667, ans=0.125 2023-12-21 22:56:12,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=274386.6666666667, ans=0.2 2023-12-21 22:56:15,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-21 22:56:25,457 INFO [train.py:886] (3/4) Epoch 9, batch 3050, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4941868.52 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:56:27,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=274520.0, ans=0.125 2023-12-21 22:56:42,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=274586.6666666667, ans=0.2 2023-12-21 22:56:46,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=274653.3333333333, ans=0.2 2023-12-21 22:56:46,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=274653.3333333333, ans=0.04949747468305833 2023-12-21 22:56:55,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-21 22:57:01,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=274720.0, ans=0.125 2023-12-21 22:57:06,216 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.681e+01 2.830e+01 3.002e+01 4.084e+01, threshold=5.659e+01, percent-clipped=0.0 2023-12-21 22:57:09,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274786.6666666667, ans=0.125 2023-12-21 22:57:15,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=274786.6666666667, ans=0.0 2023-12-21 22:57:17,858 INFO [train.py:886] (3/4) Epoch 9, batch 3100, loss[loss=0.01822, audio_tagging_loss=0.01822, over 24750.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4945149.52 frames. ], batch size: 99, lr: 1.21e-02, grad_scale: 64.0 2023-12-21 22:57:19,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=274853.3333333333, ans=0.95 2023-12-21 22:57:19,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274853.3333333333, ans=0.125 2023-12-21 22:57:19,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-12-21 22:57:20,051 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.557e-01 2023-12-21 22:57:29,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-12-21 22:57:51,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=275053.3333333333, ans=0.125 2023-12-21 22:57:51,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-12-21 22:57:52,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275053.3333333333, ans=0.125 2023-12-21 22:58:04,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=275120.0, ans=0.125 2023-12-21 22:58:06,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=275120.0, ans=0.0 2023-12-21 22:58:08,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2023-12-21 22:58:08,821 INFO [train.py:886] (3/4) Epoch 9, batch 3150, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4944232.47 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:58:31,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=275320.0, ans=0.125 2023-12-21 22:58:41,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2023-12-21 22:58:42,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-12-21 22:58:47,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=275386.6666666667, ans=0.125 2023-12-21 22:58:50,920 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.623e+01 2.777e+01 2.975e+01 3.499e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-21 22:59:00,380 INFO [train.py:886] (3/4) Epoch 9, batch 3200, loss[loss=0.01768, audio_tagging_loss=0.01768, over 24750.00 frames. ], tot_loss[loss=0.01545, audio_tagging_loss=0.01545, over 4940156.03 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 22:59:00,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=275520.0, ans=0.125 2023-12-21 22:59:02,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=275520.0, ans=0.2 2023-12-21 22:59:18,943 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.286e-02 2023-12-21 22:59:36,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=275720.0, ans=0.125 2023-12-21 22:59:44,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=275786.6666666667, ans=0.0 2023-12-21 22:59:53,490 INFO [train.py:886] (3/4) Epoch 9, batch 3250, loss[loss=0.01767, audio_tagging_loss=0.01767, over 25000.00 frames. ], tot_loss[loss=0.01543, audio_tagging_loss=0.01543, over 4947436.90 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:00:05,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=275920.0, ans=0.2 2023-12-21 23:00:06,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=275920.0, ans=0.0 2023-12-21 23:00:07,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=275920.0, ans=0.0 2023-12-21 23:00:22,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=275986.6666666667, ans=0.0 2023-12-21 23:00:22,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=275986.6666666667, ans=0.0 2023-12-21 23:00:27,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=276053.3333333333, ans=0.125 2023-12-21 23:00:34,343 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.607e+01 2.752e+01 2.914e+01 3.525e+01, threshold=5.504e+01, percent-clipped=0.0 2023-12-21 23:00:43,807 INFO [train.py:886] (3/4) Epoch 9, batch 3300, loss[loss=0.01515, audio_tagging_loss=0.01515, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4952666.12 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:00:46,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=276186.6666666667, ans=0.125 2023-12-21 23:01:36,160 INFO [train.py:886] (3/4) Epoch 9, batch 3350, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4950832.55 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:01:54,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=276586.6666666667, ans=0.95 2023-12-21 23:02:00,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-12-21 23:02:04,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276653.3333333333, ans=0.1 2023-12-21 23:02:07,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.49 vs. limit=15.0 2023-12-21 23:02:13,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=276720.0, ans=0.0 2023-12-21 23:02:15,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276720.0, ans=0.1 2023-12-21 23:02:16,737 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.658e+01 2.793e+01 2.957e+01 4.433e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-21 23:02:23,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=276786.6666666667, ans=0.125 2023-12-21 23:02:26,909 INFO [train.py:886] (3/4) Epoch 9, batch 3400, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4955464.88 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:02:35,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=276853.3333333333, ans=0.125 2023-12-21 23:02:36,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=276853.3333333333, ans=0.2 2023-12-21 23:02:39,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2023-12-21 23:02:40,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=276920.0, ans=0.2 2023-12-21 23:02:47,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2023-12-21 23:02:55,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276986.6666666667, ans=0.125 2023-12-21 23:02:55,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=276986.6666666667, ans=0.1 2023-12-21 23:02:59,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-12-21 23:03:10,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=277120.0, ans=0.0 2023-12-21 23:03:17,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=277120.0, ans=0.0 2023-12-21 23:03:18,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=277186.6666666667, ans=0.0 2023-12-21 23:03:19,106 INFO [train.py:886] (3/4) Epoch 9, batch 3450, loss[loss=0.0145, audio_tagging_loss=0.0145, over 21666.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4949863.60 frames. ], batch size: 107, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:03:19,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.93 vs. limit=22.5 2023-12-21 23:03:27,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277253.3333333333, ans=0.125 2023-12-21 23:03:28,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2023-12-21 23:03:42,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=277320.0, ans=0.125 2023-12-21 23:03:42,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=277320.0, ans=0.2 2023-12-21 23:03:59,344 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.682e+01 2.834e+01 2.983e+01 3.671e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-21 23:04:07,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2023-12-21 23:04:10,943 INFO [train.py:886] (3/4) Epoch 9, batch 3500, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01541, audio_tagging_loss=0.01541, over 4949507.11 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:04:24,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=277586.6666666667, ans=0.125 2023-12-21 23:04:24,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=277586.6666666667, ans=0.0 2023-12-21 23:04:36,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2023-12-21 23:04:41,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2023-12-21 23:04:55,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=277786.6666666667, ans=0.125 2023-12-21 23:04:59,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=277786.6666666667, ans=0.1 2023-12-21 23:05:00,636 INFO [train.py:886] (3/4) Epoch 9, batch 3550, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4949229.82 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:24,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.89 vs. limit=10.0 2023-12-21 23:05:42,779 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.633e+01 2.767e+01 2.971e+01 3.918e+01, threshold=5.534e+01, percent-clipped=0.0 2023-12-21 23:05:52,247 INFO [train.py:886] (3/4) Epoch 9, batch 3600, loss[loss=0.01455, audio_tagging_loss=0.01455, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4952561.75 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:05:54,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-21 23:05:55,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278186.6666666667, ans=0.125 2023-12-21 23:05:56,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278186.6666666667, ans=0.125 2023-12-21 23:05:58,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278186.6666666667, ans=0.1 2023-12-21 23:06:16,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278320.0, ans=0.1 2023-12-21 23:06:19,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=278320.0, ans=0.2 2023-12-21 23:06:24,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=278386.6666666667, ans=0.125 2023-12-21 23:06:25,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2023-12-21 23:06:33,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=278453.3333333333, ans=0.0 2023-12-21 23:06:42,380 INFO [train.py:886] (3/4) Epoch 9, batch 3650, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4956348.86 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:06:52,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=15.0 2023-12-21 23:07:06,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=278653.3333333333, ans=0.125 2023-12-21 23:07:09,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=278653.3333333333, ans=0.125 2023-12-21 23:07:25,672 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.660e+01 2.835e+01 3.011e+01 3.673e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-21 23:07:35,193 INFO [train.py:886] (3/4) Epoch 9, batch 3700, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 4958047.16 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:07:35,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=278853.3333333333, ans=0.125 2023-12-21 23:07:37,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2023-12-21 23:07:41,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=278853.3333333333, ans=0.125 2023-12-21 23:07:41,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-21 23:08:13,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=279053.3333333333, ans=0.0 2023-12-21 23:08:28,109 INFO [train.py:886] (3/4) Epoch 9, batch 3750, loss[loss=0.02081, audio_tagging_loss=0.02081, over 24955.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4955966.27 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:08:32,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=279186.6666666667, ans=0.0 2023-12-21 23:08:32,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=279186.6666666667, ans=0.125 2023-12-21 23:08:36,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=279186.6666666667, ans=0.0 2023-12-21 23:08:51,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=279320.0, ans=0.125 2023-12-21 23:08:55,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=279320.0, ans=0.2 2023-12-21 23:09:09,038 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.692e+01 2.838e+01 2.996e+01 3.691e+01, threshold=5.675e+01, percent-clipped=0.0 2023-12-21 23:09:12,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=12.0 2023-12-21 23:09:13,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-21 23:09:18,864 INFO [train.py:886] (3/4) Epoch 9, batch 3800, loss[loss=0.01682, audio_tagging_loss=0.01682, over 24750.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 4950665.03 frames. ], batch size: 99, lr: 1.20e-02, grad_scale: 64.0 2023-12-21 23:09:23,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.45 vs. limit=22.5 2023-12-21 23:09:35,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.50 vs. limit=22.5 2023-12-21 23:09:44,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=279653.3333333333, ans=0.0 2023-12-21 23:09:48,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-21 23:09:58,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=279720.0, ans=0.0 2023-12-21 23:10:08,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=279786.6666666667, ans=0.0 2023-12-21 23:10:08,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2023-12-21 23:10:11,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.53 vs. limit=15.0 2023-12-21 23:10:12,087 INFO [train.py:886] (3/4) Epoch 9, batch 3850, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01539, audio_tagging_loss=0.01539, over 4951190.34 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:10:37,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-12-21 23:10:43,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=280053.3333333333, ans=0.0 2023-12-21 23:10:48,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-12-21 23:10:52,251 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.650e+01 2.787e+01 2.948e+01 3.554e+01, threshold=5.574e+01, percent-clipped=0.0 2023-12-21 23:11:03,177 INFO [train.py:886] (3/4) Epoch 9, batch 3900, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4952329.03 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:11:05,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280186.6666666667, ans=0.125 2023-12-21 23:11:21,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=280253.3333333333, ans=0.125 2023-12-21 23:11:48,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=280453.3333333333, ans=0.0 2023-12-21 23:11:53,503 INFO [train.py:886] (3/4) Epoch 9, batch 3950, loss[loss=0.01505, audio_tagging_loss=0.01505, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4954170.97 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:11:59,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=280520.0, ans=0.125 2023-12-21 23:12:03,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=280586.6666666667, ans=0.0 2023-12-21 23:12:07,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=280586.6666666667, ans=0.0 2023-12-21 23:12:10,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=280586.6666666667, ans=12.0 2023-12-21 23:12:14,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=280653.3333333333, ans=0.125 2023-12-21 23:12:25,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280720.0, ans=0.1 2023-12-21 23:12:34,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=280786.6666666667, ans=0.125 2023-12-21 23:12:35,027 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.621e+01 2.776e+01 2.902e+01 4.085e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-21 23:12:43,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=280786.6666666667, ans=0.04949747468305833 2023-12-21 23:12:45,135 INFO [train.py:886] (3/4) Epoch 9, batch 4000, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4960318.94 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:12:50,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=280853.3333333333, ans=0.125 2023-12-21 23:12:55,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=280920.0, ans=0.125 2023-12-21 23:12:59,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=280920.0, ans=0.125 2023-12-21 23:13:14,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=281053.3333333333, ans=0.2 2023-12-21 23:13:28,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=281120.0, ans=0.125 2023-12-21 23:13:31,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=281120.0, ans=0.0 2023-12-21 23:13:35,310 INFO [train.py:886] (3/4) Epoch 9, batch 4050, loss[loss=0.01699, audio_tagging_loss=0.01699, over 24750.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 4955402.99 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:13:50,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=281253.3333333333, ans=0.125 2023-12-21 23:13:54,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=281320.0, ans=0.125 2023-12-21 23:13:59,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=281320.0, ans=0.1 2023-12-21 23:14:08,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=281386.6666666667, ans=0.09899494936611666 2023-12-21 23:14:16,847 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.645e+01 2.783e+01 2.979e+01 3.494e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-21 23:14:20,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-12-21 23:14:26,220 INFO [train.py:886] (3/4) Epoch 9, batch 4100, loss[loss=0.01354, audio_tagging_loss=0.01354, over 23997.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4952215.73 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 128.0 2023-12-21 23:14:37,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=281586.6666666667, ans=0.125 2023-12-21 23:14:39,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=281586.6666666667, ans=0.0 2023-12-21 23:14:47,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-21 23:14:49,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281653.3333333333, ans=0.1 2023-12-21 23:14:58,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=281720.0, ans=0.0 2023-12-21 23:15:19,257 INFO [train.py:886] (3/4) Epoch 9, batch 4150, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4954761.90 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:15:33,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=281920.0, ans=0.125 2023-12-21 23:15:50,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282053.3333333333, ans=0.1 2023-12-21 23:15:53,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=282053.3333333333, ans=0.0 2023-12-21 23:15:59,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-21 23:16:00,696 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.634e+01 2.756e+01 2.939e+01 3.432e+01, threshold=5.512e+01, percent-clipped=0.0 2023-12-21 23:16:09,854 INFO [train.py:886] (3/4) Epoch 9, batch 4200, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4950507.49 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:16:14,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=282186.6666666667, ans=0.0 2023-12-21 23:16:43,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=282386.6666666667, ans=0.0 2023-12-21 23:17:02,310 INFO [train.py:886] (3/4) Epoch 9, batch 4250, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4948022.68 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:17:06,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=282520.0, ans=0.2 2023-12-21 23:17:10,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-21 23:17:16,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=282586.6666666667, ans=0.0 2023-12-21 23:17:28,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=15.0 2023-12-21 23:17:31,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282653.3333333333, ans=0.1 2023-12-21 23:17:43,724 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.596e+01 2.839e+01 2.954e+01 3.918e+01, threshold=5.679e+01, percent-clipped=0.0 2023-12-21 23:17:54,383 INFO [train.py:886] (3/4) Epoch 9, batch 4300, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4946515.53 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:18:01,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=282853.3333333333, ans=0.2 2023-12-21 23:18:13,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-12-21 23:18:45,887 INFO [train.py:886] (3/4) Epoch 9, batch 4350, loss[loss=0.01612, audio_tagging_loss=0.01612, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4950395.91 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:18:54,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2023-12-21 23:18:59,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=283253.3333333333, ans=0.2 2023-12-21 23:18:59,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=283253.3333333333, ans=0.2 2023-12-21 23:19:02,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=283253.3333333333, ans=0.125 2023-12-21 23:19:07,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-21 23:19:15,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.75 vs. limit=15.0 2023-12-21 23:19:26,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=283453.3333333333, ans=0.035 2023-12-21 23:19:29,155 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.656e+01 2.784e+01 2.913e+01 3.411e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-21 23:19:29,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283453.3333333333, ans=0.0 2023-12-21 23:19:32,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=283453.3333333333, ans=0.0 2023-12-21 23:19:34,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-21 23:19:39,200 INFO [train.py:886] (3/4) Epoch 9, batch 4400, loss[loss=0.01533, audio_tagging_loss=0.01533, over 24750.00 frames. ], tot_loss[loss=0.01547, audio_tagging_loss=0.01547, over 4941739.85 frames. ], batch size: 99, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:19:53,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=283586.6666666667, ans=0.2 2023-12-21 23:20:05,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.72 vs. limit=22.5 2023-12-21 23:20:29,343 INFO [train.py:886] (3/4) Epoch 9, batch 4450, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4941468.93 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:20:45,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.77 vs. limit=22.5 2023-12-21 23:20:55,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=283986.6666666667, ans=0.0 2023-12-21 23:20:57,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283986.6666666667, ans=0.125 2023-12-21 23:20:57,859 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:21:03,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2023-12-21 23:21:04,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=284053.3333333333, ans=0.125 2023-12-21 23:21:05,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=284053.3333333333, ans=0.125 2023-12-21 23:21:08,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=284053.3333333333, ans=0.125 2023-12-21 23:21:12,820 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.640e+01 2.772e+01 2.975e+01 3.500e+01, threshold=5.544e+01, percent-clipped=0.0 2023-12-21 23:21:17,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2023-12-21 23:21:21,362 INFO [train.py:886] (3/4) Epoch 9, batch 4500, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 4946545.61 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:21:21,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=284186.6666666667, ans=0.0 2023-12-21 23:21:22,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=284186.6666666667, ans=0.125 2023-12-21 23:21:25,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=284186.6666666667, ans=0.125 2023-12-21 23:21:29,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=284186.6666666667, ans=0.125 2023-12-21 23:21:32,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2023-12-21 23:22:12,488 INFO [train.py:886] (3/4) Epoch 9, batch 4550, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4950412.24 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0 2023-12-21 23:22:27,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-21 23:22:52,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=284786.6666666667, ans=0.0 2023-12-21 23:22:53,256 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.599e+01 2.790e+01 2.989e+01 3.611e+01, threshold=5.581e+01, percent-clipped=0.0 2023-12-21 23:23:01,875 INFO [train.py:886] (3/4) Epoch 9, batch 4600, loss[loss=0.01708, audio_tagging_loss=0.01708, over 25000.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4951347.48 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:23:11,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=284920.0, ans=0.0 2023-12-21 23:23:14,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-21 23:23:33,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=285053.3333333333, ans=0.125 2023-12-21 23:23:38,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285053.3333333333, ans=0.125 2023-12-21 23:23:41,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285053.3333333333, ans=0.1 2023-12-21 23:23:54,797 INFO [train.py:886] (3/4) Epoch 9, batch 4650, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4952598.30 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:24:19,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2023-12-21 23:24:23,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=285320.0, ans=0.07 2023-12-21 23:24:32,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=285386.6666666667, ans=0.125 2023-12-21 23:24:33,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=285453.3333333333, ans=0.0 2023-12-21 23:24:34,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-12-21 23:24:35,609 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.629e+01 2.821e+01 3.013e+01 3.684e+01, threshold=5.641e+01, percent-clipped=0.0 2023-12-21 23:24:36,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-12-21 23:24:36,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=285453.3333333333, ans=0.2 2023-12-21 23:24:43,871 INFO [train.py:886] (3/4) Epoch 9, batch 4700, loss[loss=0.01671, audio_tagging_loss=0.01671, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4955435.62 frames. ], batch size: 99, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:24:54,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=285586.6666666667, ans=0.0 2023-12-21 23:24:57,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=285586.6666666667, ans=0.125 2023-12-21 23:24:59,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-12-21 23:25:08,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=285653.3333333333, ans=0.0 2023-12-21 23:25:11,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=285720.0, ans=0.0 2023-12-21 23:25:15,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=15.0 2023-12-21 23:25:18,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=285720.0, ans=0.125 2023-12-21 23:25:27,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=285786.6666666667, ans=0.125 2023-12-21 23:25:27,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2023-12-21 23:25:27,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=285786.6666666667, ans=0.125 2023-12-21 23:25:31,406 INFO [train.py:886] (3/4) Epoch 9, batch 4750, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.01542, audio_tagging_loss=0.01542, over 4949862.61 frames. ], batch size: 99, lr: 1.18e-02, grad_scale: 64.0 2023-12-21 23:25:31,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285853.3333333333, ans=0.125 2023-12-21 23:25:32,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2023-12-21 23:26:08,797 INFO [train.py:886] (3/4) Epoch 10, batch 0, loss[loss=0.03133, audio_tagging_loss=0.03133, over 24035.00 frames. ], tot_loss[loss=0.03133, audio_tagging_loss=0.03133, over 24035.00 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:26:08,797 INFO [train.py:909] (3/4) Computing validation loss 2023-12-21 23:26:30,369 INFO [train.py:917] (3/4) Epoch 10, validation: loss=0.03426, audio_tagging_loss=0.03426, over 3737520.00 frames. 2023-12-21 23:26:30,370 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-21 23:26:40,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286026.6666666667, ans=0.1 2023-12-21 23:26:55,363 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.685e+01 2.858e+01 3.839e+01 9.905e+01, threshold=5.715e+01, percent-clipped=6.0 2023-12-21 23:26:55,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286093.3333333333, ans=0.125 2023-12-21 23:27:12,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=286226.6666666667, ans=0.2 2023-12-21 23:27:21,832 INFO [train.py:886] (3/4) Epoch 10, batch 50, loss[loss=0.01958, audio_tagging_loss=0.01958, over 25000.00 frames. ], tot_loss[loss=0.02424, audio_tagging_loss=0.02424, over 1121438.21 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:27:38,758 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.788e-01 2023-12-21 23:27:38,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=286360.0, ans=0.125 2023-12-21 23:27:38,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=286360.0, ans=0.125 2023-12-21 23:27:42,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=286426.6666666667, ans=0.125 2023-12-21 23:28:12,477 INFO [train.py:886] (3/4) Epoch 10, batch 100, loss[loss=0.02151, audio_tagging_loss=0.02151, over 25000.00 frames. ], tot_loss[loss=0.02072, audio_tagging_loss=0.02072, over 1973851.09 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:28:25,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286693.3333333333, ans=0.125 2023-12-21 23:28:36,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=286760.0, ans=0.125 2023-12-21 23:28:38,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.924e+01 3.109e+01 3.428e+01 4.349e+01, threshold=6.218e+01, percent-clipped=0.0 2023-12-21 23:28:47,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-12-21 23:28:53,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=286893.3333333333, ans=0.5 2023-12-21 23:28:56,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=286893.3333333333, ans=0.0 2023-12-21 23:29:04,974 INFO [train.py:886] (3/4) Epoch 10, batch 150, loss[loss=0.01599, audio_tagging_loss=0.01599, over 24911.00 frames. ], tot_loss[loss=0.01899, audio_tagging_loss=0.01899, over 2631983.34 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:29:07,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2023-12-21 23:29:09,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=286960.0, ans=0.125 2023-12-21 23:29:15,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=287026.6666666667, ans=0.0 2023-12-21 23:29:19,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=287026.6666666667, ans=0.0 2023-12-21 23:29:34,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=287093.3333333333, ans=0.125 2023-12-21 23:29:55,924 INFO [train.py:886] (3/4) Epoch 10, batch 200, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01772, audio_tagging_loss=0.01772, over 3147160.53 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:30:10,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=287360.0, ans=0.125 2023-12-21 23:30:12,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=287360.0, ans=0.0 2023-12-21 23:30:14,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=287360.0, ans=0.0 2023-12-21 23:30:16,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=287426.6666666667, ans=0.125 2023-12-21 23:30:20,758 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.628e+01 2.764e+01 2.956e+01 4.225e+01, threshold=5.527e+01, percent-clipped=0.0 2023-12-21 23:30:26,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287493.3333333333, ans=0.125 2023-12-21 23:30:28,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287493.3333333333, ans=0.125 2023-12-21 23:30:41,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2023-12-21 23:30:45,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=287560.0, ans=0.125 2023-12-21 23:30:47,389 INFO [train.py:886] (3/4) Epoch 10, batch 250, loss[loss=0.01659, audio_tagging_loss=0.01659, over 25000.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 3549579.15 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:30:48,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287626.6666666667, ans=0.125 2023-12-21 23:31:01,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287693.3333333333, ans=0.0 2023-12-21 23:31:04,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287693.3333333333, ans=0.1 2023-12-21 23:31:12,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=287760.0, ans=0.125 2023-12-21 23:31:15,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=287760.0, ans=0.09899494936611666 2023-12-21 23:31:38,983 INFO [train.py:886] (3/4) Epoch 10, batch 300, loss[loss=0.01565, audio_tagging_loss=0.01565, over 24750.00 frames. ], tot_loss[loss=0.01679, audio_tagging_loss=0.01679, over 3855636.75 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:31:39,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=287960.0, ans=0.125 2023-12-21 23:31:39,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=287960.0, ans=0.125 2023-12-21 23:31:46,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=287960.0, ans=0.95 2023-12-21 23:32:03,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.631e+01 2.795e+01 2.956e+01 3.518e+01, threshold=5.589e+01, percent-clipped=0.0 2023-12-21 23:32:20,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=288226.6666666667, ans=0.125 2023-12-21 23:32:28,497 INFO [train.py:886] (3/4) Epoch 10, batch 350, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.0164, audio_tagging_loss=0.0164, over 4094706.40 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:32:33,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288293.3333333333, ans=0.125 2023-12-21 23:32:41,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=288360.0, ans=0.125 2023-12-21 23:32:51,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=288426.6666666667, ans=0.125 2023-12-21 23:32:54,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=288426.6666666667, ans=0.0 2023-12-21 23:33:01,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288493.3333333333, ans=0.1 2023-12-21 23:33:08,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288560.0, ans=0.1 2023-12-21 23:33:20,887 INFO [train.py:886] (3/4) Epoch 10, batch 400, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01608, audio_tagging_loss=0.01608, over 4287650.00 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:33:24,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=288626.6666666667, ans=0.125 2023-12-21 23:33:37,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=288693.3333333333, ans=0.0 2023-12-21 23:33:45,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=288760.0, ans=0.0 2023-12-21 23:33:47,213 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.616e+01 2.753e+01 2.907e+01 3.389e+01, threshold=5.507e+01, percent-clipped=0.0 2023-12-21 23:33:50,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=288760.0, ans=0.0 2023-12-21 23:33:50,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=288760.0, ans=0.0 2023-12-21 23:33:51,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=288826.6666666667, ans=0.125 2023-12-21 23:33:55,986 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:34:01,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-21 23:34:11,442 INFO [train.py:886] (3/4) Epoch 10, batch 450, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24750.00 frames. ], tot_loss[loss=0.01582, audio_tagging_loss=0.01582, over 4434906.97 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:34:17,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-12-21 23:34:19,448 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.723e-02 2023-12-21 23:34:26,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=289026.6666666667, ans=0.2 2023-12-21 23:34:29,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=289026.6666666667, ans=0.125 2023-12-21 23:34:58,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=289226.6666666667, ans=0.125 2023-12-21 23:34:59,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289226.6666666667, ans=0.1 2023-12-21 23:35:03,895 INFO [train.py:886] (3/4) Epoch 10, batch 500, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 4550262.36 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:35:18,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=289360.0, ans=0.125 2023-12-21 23:35:30,831 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.565e+01 2.709e+01 2.854e+01 3.600e+01, threshold=5.419e+01, percent-clipped=0.0 2023-12-21 23:35:32,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=289426.6666666667, ans=0.2 2023-12-21 23:35:33,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=289426.6666666667, ans=0.125 2023-12-21 23:35:48,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2023-12-21 23:35:56,481 INFO [train.py:886] (3/4) Epoch 10, batch 550, loss[loss=0.01212, audio_tagging_loss=0.01212, over 23938.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4641373.90 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:35:57,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289626.6666666667, ans=0.1 2023-12-21 23:36:03,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=289626.6666666667, ans=0.125 2023-12-21 23:36:10,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=289693.3333333333, ans=0.125 2023-12-21 23:36:17,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=289760.0, ans=0.0 2023-12-21 23:36:20,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=289760.0, ans=0.125 2023-12-21 23:36:30,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=289826.6666666667, ans=0.125 2023-12-21 23:36:35,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-12-21 23:36:40,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289893.3333333333, ans=0.1 2023-12-21 23:36:45,283 INFO [train.py:886] (3/4) Epoch 10, batch 600, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.01559, audio_tagging_loss=0.01559, over 4701442.03 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:36:54,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=290026.6666666667, ans=0.125 2023-12-21 23:37:04,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-12-21 23:37:09,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=290093.3333333333, ans=0.125 2023-12-21 23:37:10,374 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.605e+01 2.757e+01 2.995e+01 3.479e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:37:36,071 INFO [train.py:886] (3/4) Epoch 10, batch 650, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01551, audio_tagging_loss=0.01551, over 4751354.09 frames. ], batch size: 99, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:37:36,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=290293.3333333333, ans=0.2 2023-12-21 23:37:50,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2023-12-21 23:37:50,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=290360.0, ans=0.125 2023-12-21 23:38:05,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.09 vs. limit=22.5 2023-12-21 23:38:07,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-21 23:38:12,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=290493.3333333333, ans=0.0 2023-12-21 23:38:12,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-12-21 23:38:19,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=290560.0, ans=0.125 2023-12-21 23:38:27,729 INFO [train.py:886] (3/4) Epoch 10, batch 700, loss[loss=0.01647, audio_tagging_loss=0.01647, over 25000.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4794100.67 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 64.0 2023-12-21 23:38:52,705 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 2.671e+01 2.861e+01 3.072e+01 3.885e+01, threshold=5.722e+01, percent-clipped=0.0 2023-12-21 23:39:17,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=290893.3333333333, ans=0.0 2023-12-21 23:39:18,731 INFO [train.py:886] (3/4) Epoch 10, batch 750, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4830546.09 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:39:22,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2023-12-21 23:39:34,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=291026.6666666667, ans=0.125 2023-12-21 23:39:35,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=291026.6666666667, ans=0.0 2023-12-21 23:39:38,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=291026.6666666667, ans=0.1 2023-12-21 23:39:39,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=291093.3333333333, ans=0.125 2023-12-21 23:39:51,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=291160.0, ans=0.125 2023-12-21 23:40:00,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=291226.6666666667, ans=0.125 2023-12-21 23:40:08,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291293.3333333333, ans=0.1 2023-12-21 23:40:10,523 INFO [train.py:886] (3/4) Epoch 10, batch 800, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4861684.84 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:40:20,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=291360.0, ans=0.125 2023-12-21 23:40:34,600 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.609e+01 2.792e+01 2.929e+01 3.584e+01, threshold=5.584e+01, percent-clipped=0.0 2023-12-21 23:40:59,711 INFO [train.py:886] (3/4) Epoch 10, batch 850, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4887199.32 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:41:12,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=291693.3333333333, ans=0.0 2023-12-21 23:41:18,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291693.3333333333, ans=0.0 2023-12-21 23:41:19,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291693.3333333333, ans=0.0 2023-12-21 23:41:22,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=291760.0, ans=0.0 2023-12-21 23:41:22,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=291760.0, ans=0.125 2023-12-21 23:41:42,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=291893.3333333333, ans=0.04949747468305833 2023-12-21 23:41:51,248 INFO [train.py:886] (3/4) Epoch 10, batch 900, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4904963.95 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:41:57,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291960.0, ans=0.125 2023-12-21 23:41:59,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291960.0, ans=0.0 2023-12-21 23:42:01,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=292026.6666666667, ans=0.2 2023-12-21 23:42:01,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-12-21 23:42:02,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2023-12-21 23:42:05,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292026.6666666667, ans=0.1 2023-12-21 23:42:13,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=292093.3333333333, ans=0.0 2023-12-21 23:42:16,469 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.658e+01 2.802e+01 2.964e+01 3.575e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-21 23:42:42,909 INFO [train.py:886] (3/4) Epoch 10, batch 950, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4903588.90 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:42:46,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292293.3333333333, ans=0.1 2023-12-21 23:43:03,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=292426.6666666667, ans=0.0 2023-12-21 23:43:10,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-12-21 23:43:28,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=292560.0, ans=12.0 2023-12-21 23:43:32,950 INFO [train.py:886] (3/4) Epoch 10, batch 1000, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4899168.48 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:43:59,053 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.595e+01 2.757e+01 2.971e+01 3.553e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-21 23:44:22,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.18 vs. limit=12.0 2023-12-21 23:44:23,269 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.362e-03 2023-12-21 23:44:24,845 INFO [train.py:886] (3/4) Epoch 10, batch 1050, loss[loss=0.01599, audio_tagging_loss=0.01599, over 24750.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4910050.29 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:44:40,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=293026.6666666667, ans=0.2 2023-12-21 23:44:42,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=293026.6666666667, ans=0.025 2023-12-21 23:44:58,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-21 23:45:02,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=293160.0, ans=0.125 2023-12-21 23:45:06,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=293226.6666666667, ans=0.0 2023-12-21 23:45:16,150 INFO [train.py:886] (3/4) Epoch 10, batch 1100, loss[loss=0.01698, audio_tagging_loss=0.01698, over 22059.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4914704.80 frames. ], batch size: 107, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:45:23,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=293293.3333333333, ans=0.025 2023-12-21 23:45:44,966 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.604e+01 2.781e+01 2.959e+01 3.663e+01, threshold=5.562e+01, percent-clipped=0.0 2023-12-21 23:45:59,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=293493.3333333333, ans=0.2 2023-12-21 23:46:12,052 INFO [train.py:886] (3/4) Epoch 10, batch 1150, loss[loss=0.01643, audio_tagging_loss=0.01643, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4926147.66 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:46:16,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=293626.6666666667, ans=0.2 2023-12-21 23:46:24,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=293693.3333333333, ans=0.125 2023-12-21 23:47:03,890 INFO [train.py:886] (3/4) Epoch 10, batch 1200, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4937665.00 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:47:06,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-12-21 23:47:16,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=294026.6666666667, ans=0.2 2023-12-21 23:47:26,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=294093.3333333333, ans=0.0 2023-12-21 23:47:28,070 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.576e+01 2.725e+01 2.862e+01 3.373e+01, threshold=5.450e+01, percent-clipped=0.0 2023-12-21 23:47:54,682 INFO [train.py:886] (3/4) Epoch 10, batch 1250, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4936332.83 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:47:56,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2023-12-21 23:47:57,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=294293.3333333333, ans=0.2 2023-12-21 23:48:20,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=294426.6666666667, ans=0.0 2023-12-21 23:48:27,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2023-12-21 23:48:35,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294493.3333333333, ans=0.1 2023-12-21 23:48:37,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=294560.0, ans=0.0 2023-12-21 23:48:40,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-12-21 23:48:40,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294560.0, ans=0.1 2023-12-21 23:48:46,965 INFO [train.py:886] (3/4) Epoch 10, batch 1300, loss[loss=0.01857, audio_tagging_loss=0.01857, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4940217.43 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:48:52,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=294626.6666666667, ans=0.2 2023-12-21 23:49:13,634 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.699e+01 2.817e+01 2.948e+01 3.406e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-21 23:49:14,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=294760.0, ans=0.0 2023-12-21 23:49:16,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-21 23:49:33,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=294893.3333333333, ans=0.0 2023-12-21 23:49:39,366 INFO [train.py:886] (3/4) Epoch 10, batch 1350, loss[loss=0.0166, audio_tagging_loss=0.0166, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4943752.38 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 128.0 2023-12-21 23:50:10,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=295160.0, ans=0.125 2023-12-21 23:50:13,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295160.0, ans=0.1 2023-12-21 23:50:30,335 INFO [train.py:886] (3/4) Epoch 10, batch 1400, loss[loss=0.01614, audio_tagging_loss=0.01614, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4940235.82 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:50:32,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295293.3333333333, ans=0.1 2023-12-21 23:50:52,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=295426.6666666667, ans=0.0 2023-12-21 23:50:57,859 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.578e+01 2.757e+01 2.921e+01 3.435e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-21 23:51:00,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=295426.6666666667, ans=0.1 2023-12-21 23:51:02,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=295493.3333333333, ans=0.09899494936611666 2023-12-21 23:51:07,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=295493.3333333333, ans=0.125 2023-12-21 23:51:08,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.62 vs. limit=22.5 2023-12-21 23:51:23,342 INFO [train.py:886] (3/4) Epoch 10, batch 1450, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4947484.18 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:51:24,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=295626.6666666667, ans=0.125 2023-12-21 23:51:30,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=295626.6666666667, ans=0.0 2023-12-21 23:51:34,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=295693.3333333333, ans=0.125 2023-12-21 23:51:35,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-12-21 23:51:44,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=295760.0, ans=0.125 2023-12-21 23:51:53,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2023-12-21 23:51:57,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=295826.6666666667, ans=10.0 2023-12-21 23:51:59,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.57 vs. limit=22.5 2023-12-21 23:52:08,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-12-21 23:52:14,520 INFO [train.py:886] (3/4) Epoch 10, batch 1500, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4953351.07 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:52:17,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=295960.0, ans=0.0 2023-12-21 23:52:21,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=295960.0, ans=0.05 2023-12-21 23:52:28,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-21 23:52:31,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=296026.6666666667, ans=0.125 2023-12-21 23:52:37,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=296093.3333333333, ans=0.2 2023-12-21 23:52:41,517 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 2.575e+01 2.789e+01 2.982e+01 3.364e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-21 23:53:06,793 INFO [train.py:886] (3/4) Epoch 10, batch 1550, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4945717.38 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 64.0 2023-12-21 23:53:06,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296293.3333333333, ans=0.1 2023-12-21 23:53:18,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-12-21 23:53:24,237 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.756e+00 2023-12-21 23:53:29,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296426.6666666667, ans=0.1 2023-12-21 23:53:33,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296426.6666666667, ans=0.1 2023-12-21 23:53:41,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=296493.3333333333, ans=0.0 2023-12-21 23:53:42,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296493.3333333333, ans=0.0 2023-12-21 23:53:52,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=12.0 2023-12-21 23:53:59,449 INFO [train.py:886] (3/4) Epoch 10, batch 1600, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4937327.86 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:54:25,591 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.673e+01 2.852e+01 3.027e+01 3.338e+01, threshold=5.705e+01, percent-clipped=0.0 2023-12-21 23:54:29,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=296826.6666666667, ans=0.125 2023-12-21 23:54:32,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=15.0 2023-12-21 23:54:34,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=296826.6666666667, ans=0.2 2023-12-21 23:54:40,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-21 23:54:49,779 INFO [train.py:886] (3/4) Epoch 10, batch 1650, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4936926.92 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:54:50,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=296960.0, ans=0.2 2023-12-21 23:54:51,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=296960.0, ans=0.0 2023-12-21 23:55:09,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297026.6666666667, ans=0.125 2023-12-21 23:55:10,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.62 vs. limit=22.5 2023-12-21 23:55:22,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297160.0, ans=0.125 2023-12-21 23:55:30,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=297160.0, ans=0.125 2023-12-21 23:55:30,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=297160.0, ans=0.125 2023-12-21 23:55:32,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297226.6666666667, ans=0.125 2023-12-21 23:55:33,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2023-12-21 23:55:42,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=297293.3333333333, ans=0.07 2023-12-21 23:55:42,885 INFO [train.py:886] (3/4) Epoch 10, batch 1700, loss[loss=0.0166, audio_tagging_loss=0.0166, over 25000.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4938747.09 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:55:46,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=297293.3333333333, ans=0.125 2023-12-21 23:55:50,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=297293.3333333333, ans=0.5 2023-12-21 23:56:10,397 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.587e+01 2.708e+01 2.875e+01 3.627e+01, threshold=5.416e+01, percent-clipped=0.0 2023-12-21 23:56:15,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=297493.3333333333, ans=0.125 2023-12-21 23:56:17,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2023-12-21 23:56:18,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.57 vs. limit=22.5 2023-12-21 23:56:31,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297560.0, ans=0.125 2023-12-21 23:56:33,969 INFO [train.py:886] (3/4) Epoch 10, batch 1750, loss[loss=0.01405, audio_tagging_loss=0.01405, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4942849.49 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:56:41,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=297626.6666666667, ans=0.07 2023-12-21 23:56:48,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=297693.3333333333, ans=0.125 2023-12-21 23:57:10,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-12-21 23:57:11,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297826.6666666667, ans=0.125 2023-12-21 23:57:12,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=297826.6666666667, ans=0.125 2023-12-21 23:57:24,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=297960.0, ans=0.125 2023-12-21 23:57:24,855 INFO [train.py:886] (3/4) Epoch 10, batch 1800, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4953884.77 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:57:25,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297960.0, ans=0.125 2023-12-21 23:57:26,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297960.0, ans=0.125 2023-12-21 23:57:33,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=298026.6666666667, ans=0.0 2023-12-21 23:57:47,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2023-12-21 23:57:52,159 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.629e+01 2.799e+01 2.963e+01 3.676e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-21 23:57:57,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2023-12-21 23:58:17,337 INFO [train.py:886] (3/4) Epoch 10, batch 1850, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01524, audio_tagging_loss=0.01524, over 4947411.09 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:58:38,282 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-21 23:58:41,958 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.439e+00 2023-12-21 23:58:44,592 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.138e-03 2023-12-21 23:59:02,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=298560.0, ans=0.125 2023-12-21 23:59:07,463 INFO [train.py:886] (3/4) Epoch 10, batch 1900, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 4943275.07 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-21 23:59:14,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298626.6666666667, ans=0.1 2023-12-21 23:59:30,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298760.0, ans=0.125 2023-12-21 23:59:34,344 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.713e+01 2.863e+01 3.091e+01 4.533e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-21 23:59:43,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=12.0 2023-12-21 23:59:50,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=298893.3333333333, ans=0.125 2023-12-21 23:59:59,043 INFO [train.py:886] (3/4) Epoch 10, batch 1950, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01538, audio_tagging_loss=0.01538, over 4939862.59 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:00:33,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299160.0, ans=0.1 2023-12-22 00:00:34,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=299160.0, ans=0.07 2023-12-22 00:00:40,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=299226.6666666667, ans=0.125 2023-12-22 00:00:43,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299226.6666666667, ans=0.1 2023-12-22 00:00:48,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=299226.6666666667, ans=0.2 2023-12-22 00:00:50,509 INFO [train.py:886] (3/4) Epoch 10, batch 2000, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01527, audio_tagging_loss=0.01527, over 4943773.39 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:00:58,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2023-12-22 00:01:16,429 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.618e+01 2.723e+01 2.914e+01 3.556e+01, threshold=5.446e+01, percent-clipped=0.0 2023-12-22 00:01:20,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=299493.3333333333, ans=0.035 2023-12-22 00:01:42,013 INFO [train.py:886] (3/4) Epoch 10, batch 2050, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4940127.79 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:01:44,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=299626.6666666667, ans=0.2 2023-12-22 00:01:46,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=299626.6666666667, ans=15.0 2023-12-22 00:01:54,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=299693.3333333333, ans=0.125 2023-12-22 00:01:58,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=12.0 2023-12-22 00:01:58,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=299693.3333333333, ans=0.0 2023-12-22 00:02:04,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=299760.0, ans=22.5 2023-12-22 00:02:06,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=299760.0, ans=0.0 2023-12-22 00:02:18,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=15.0 2023-12-22 00:02:33,764 INFO [train.py:886] (3/4) Epoch 10, batch 2100, loss[loss=0.01444, audio_tagging_loss=0.01444, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4946418.52 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:02:35,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299960.0, ans=0.1 2023-12-22 00:02:43,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=300026.6666666667, ans=0.1 2023-12-22 00:02:50,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300026.6666666667, ans=0.0 2023-12-22 00:03:00,778 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.587e+01 2.718e+01 2.864e+01 3.394e+01, threshold=5.437e+01, percent-clipped=0.0 2023-12-22 00:03:11,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=300160.0, ans=0.125 2023-12-22 00:03:12,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-22 00:03:24,898 INFO [train.py:886] (3/4) Epoch 10, batch 2150, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4947324.99 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:03:26,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=300293.3333333333, ans=0.2 2023-12-22 00:03:47,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300426.6666666667, ans=0.1 2023-12-22 00:04:14,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.25 vs. limit=10.0 2023-12-22 00:04:16,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300560.0, ans=0.1 2023-12-22 00:04:17,939 INFO [train.py:886] (3/4) Epoch 10, batch 2200, loss[loss=0.01717, audio_tagging_loss=0.01717, over 24750.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4948065.46 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:04:27,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.90 vs. limit=22.5 2023-12-22 00:04:36,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=300693.3333333333, ans=0.0 2023-12-22 00:04:37,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-12-22 00:04:41,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-12-22 00:04:43,832 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.615e+01 2.771e+01 2.942e+01 3.456e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 00:04:50,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=300826.6666666667, ans=0.125 2023-12-22 00:05:09,186 INFO [train.py:886] (3/4) Epoch 10, batch 2250, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4942031.88 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:05:17,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-12-22 00:05:23,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=301026.6666666667, ans=0.0 2023-12-22 00:05:23,535 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:05:37,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301093.3333333333, ans=0.125 2023-12-22 00:05:58,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=301226.6666666667, ans=0.125 2023-12-22 00:05:58,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2023-12-22 00:05:59,828 INFO [train.py:886] (3/4) Epoch 10, batch 2300, loss[loss=0.01345, audio_tagging_loss=0.01345, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4939737.86 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:12,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-22 00:06:26,821 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.627e+01 2.751e+01 2.919e+01 3.578e+01, threshold=5.503e+01, percent-clipped=0.0 2023-12-22 00:06:44,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=301560.0, ans=15.0 2023-12-22 00:06:48,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-12-22 00:06:50,942 INFO [train.py:886] (3/4) Epoch 10, batch 2350, loss[loss=0.01589, audio_tagging_loss=0.01589, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4944730.08 frames. ], batch size: 99, lr: 1.10e-02, grad_scale: 64.0 2023-12-22 00:06:53,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=301626.6666666667, ans=0.125 2023-12-22 00:06:54,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=301626.6666666667, ans=0.125 2023-12-22 00:06:54,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=301626.6666666667, ans=0.0 2023-12-22 00:07:03,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301693.3333333333, ans=0.0 2023-12-22 00:07:03,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-12-22 00:07:08,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301693.3333333333, ans=0.125 2023-12-22 00:07:24,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=12.0 2023-12-22 00:07:42,757 INFO [train.py:886] (3/4) Epoch 10, batch 2400, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4950003.68 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:08:04,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=302093.3333333333, ans=0.125 2023-12-22 00:08:08,767 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 2.611e+01 2.780e+01 2.950e+01 3.631e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:08:11,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2023-12-22 00:08:21,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=302160.0, ans=0.0 2023-12-22 00:08:33,177 INFO [train.py:886] (3/4) Epoch 10, batch 2450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4956899.75 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:08:33,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302293.3333333333, ans=0.125 2023-12-22 00:08:48,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=302360.0, ans=0.125 2023-12-22 00:08:48,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-12-22 00:08:50,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=302360.0, ans=0.125 2023-12-22 00:08:53,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=302426.6666666667, ans=0.0 2023-12-22 00:09:05,921 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.025e-01 2023-12-22 00:09:07,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=302493.3333333333, ans=0.125 2023-12-22 00:09:10,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=302493.3333333333, ans=0.125 2023-12-22 00:09:21,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=302560.0, ans=0.125 2023-12-22 00:09:25,544 INFO [train.py:886] (3/4) Epoch 10, batch 2500, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4949514.12 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:09:31,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302626.6666666667, ans=0.125 2023-12-22 00:09:42,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=302693.3333333333, ans=0.2 2023-12-22 00:09:43,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302693.3333333333, ans=0.125 2023-12-22 00:09:47,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2023-12-22 00:09:52,768 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 2.737e+01 2.858e+01 3.085e+01 3.601e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 00:09:54,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302760.0, ans=0.1 2023-12-22 00:10:16,832 INFO [train.py:886] (3/4) Epoch 10, batch 2550, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4941892.65 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:10:29,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-22 00:10:54,175 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.242e-01 2023-12-22 00:10:59,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=303226.6666666667, ans=0.125 2023-12-22 00:11:08,868 INFO [train.py:886] (3/4) Epoch 10, batch 2600, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01521, audio_tagging_loss=0.01521, over 4943057.83 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:11:21,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303360.0, ans=0.0 2023-12-22 00:11:33,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303426.6666666667, ans=0.1 2023-12-22 00:11:35,999 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.658e+01 2.842e+01 3.002e+01 3.670e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 00:12:00,774 INFO [train.py:886] (3/4) Epoch 10, batch 2650, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4943049.11 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:12:03,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=303626.6666666667, ans=0.125 2023-12-22 00:12:03,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=303626.6666666667, ans=0.125 2023-12-22 00:12:26,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303760.0, ans=0.1 2023-12-22 00:12:27,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=303760.0, ans=0.0 2023-12-22 00:12:33,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=303826.6666666667, ans=0.2 2023-12-22 00:12:40,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=303826.6666666667, ans=0.05 2023-12-22 00:12:45,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=303893.3333333333, ans=0.1 2023-12-22 00:12:51,224 INFO [train.py:886] (3/4) Epoch 10, batch 2700, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4944977.42 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:13:12,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=304093.3333333333, ans=0.125 2023-12-22 00:13:18,950 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.625e+01 2.781e+01 2.947e+01 3.660e+01, threshold=5.563e+01, percent-clipped=0.0 2023-12-22 00:13:20,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=304093.3333333333, ans=0.0 2023-12-22 00:13:26,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=304160.0, ans=0.5 2023-12-22 00:13:41,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-12-22 00:13:42,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304226.6666666667, ans=0.1 2023-12-22 00:13:44,515 INFO [train.py:886] (3/4) Epoch 10, batch 2750, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4954006.12 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:13:47,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=22.5 2023-12-22 00:13:57,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=304360.0, ans=0.125 2023-12-22 00:13:57,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=304360.0, ans=0.125 2023-12-22 00:13:59,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2023-12-22 00:14:05,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2023-12-22 00:14:15,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=304493.3333333333, ans=0.125 2023-12-22 00:14:16,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-22 00:14:20,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=304493.3333333333, ans=0.125 2023-12-22 00:14:31,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=304560.0, ans=0.125 2023-12-22 00:14:33,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304626.6666666667, ans=0.1 2023-12-22 00:14:35,051 INFO [train.py:886] (3/4) Epoch 10, batch 2800, loss[loss=0.01415, audio_tagging_loss=0.01415, over 22740.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4952123.75 frames. ], batch size: 107, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:14:36,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=304626.6666666667, ans=0.0 2023-12-22 00:14:41,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304626.6666666667, ans=0.1 2023-12-22 00:14:43,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304626.6666666667, ans=0.125 2023-12-22 00:14:49,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=304693.3333333333, ans=0.0 2023-12-22 00:14:54,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304693.3333333333, ans=0.1 2023-12-22 00:15:01,759 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.658e+01 2.805e+01 2.937e+01 3.602e+01, threshold=5.609e+01, percent-clipped=0.0 2023-12-22 00:15:03,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-12-22 00:15:24,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=304893.3333333333, ans=0.0 2023-12-22 00:15:27,326 INFO [train.py:886] (3/4) Epoch 10, batch 2850, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01525, audio_tagging_loss=0.01525, over 4948152.53 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:15:34,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=304960.0, ans=0.125 2023-12-22 00:16:03,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=305160.0, ans=0.05 2023-12-22 00:16:06,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=305160.0, ans=0.0 2023-12-22 00:16:19,436 INFO [train.py:886] (3/4) Epoch 10, batch 2900, loss[loss=0.01795, audio_tagging_loss=0.01795, over 25000.00 frames. ], tot_loss[loss=0.01523, audio_tagging_loss=0.01523, over 4945242.57 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:16:45,546 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.645e+01 2.819e+01 2.957e+01 3.644e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 00:16:48,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=305426.6666666667, ans=0.0 2023-12-22 00:17:03,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=305560.0, ans=0.0 2023-12-22 00:17:10,127 INFO [train.py:886] (3/4) Epoch 10, batch 2950, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4950387.68 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:17:10,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=305626.6666666667, ans=22.5 2023-12-22 00:17:16,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=305626.6666666667, ans=0.0 2023-12-22 00:17:19,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=305626.6666666667, ans=0.125 2023-12-22 00:17:25,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=305693.3333333333, ans=0.125 2023-12-22 00:17:40,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=305826.6666666667, ans=0.0 2023-12-22 00:17:40,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=305826.6666666667, ans=0.0 2023-12-22 00:17:42,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=305826.6666666667, ans=0.125 2023-12-22 00:17:44,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=305826.6666666667, ans=0.2 2023-12-22 00:18:03,085 INFO [train.py:886] (3/4) Epoch 10, batch 3000, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4957370.26 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:18:03,086 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 00:18:10,885 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7786, 5.8379, 5.3229, 5.6537], device='cuda:3') 2023-12-22 00:18:18,070 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.4091, 2.3799, 2.8327, 2.1916, 2.4607, 2.5549, 1.4625, 1.8865], device='cuda:3') 2023-12-22 00:18:23,152 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9701, 2.7965, 3.7273, 3.8061], device='cuda:3') 2023-12-22 00:18:24,623 INFO [train.py:917] (3/4) Epoch 10, validation: loss=0.03417, audio_tagging_loss=0.03417, over 3737520.00 frames. 2023-12-22 00:18:24,624 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 00:18:34,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=306026.6666666667, ans=0.125 2023-12-22 00:18:46,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=306093.3333333333, ans=0.0 2023-12-22 00:18:49,711 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.598e+01 2.703e+01 2.864e+01 3.269e+01, threshold=5.407e+01, percent-clipped=0.0 2023-12-22 00:19:08,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=306226.6666666667, ans=0.09899494936611666 2023-12-22 00:19:14,560 INFO [train.py:886] (3/4) Epoch 10, batch 3050, loss[loss=0.01644, audio_tagging_loss=0.01644, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4954413.91 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:19:22,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=306293.3333333333, ans=0.0 2023-12-22 00:19:25,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.07 vs. limit=15.0 2023-12-22 00:19:39,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306426.6666666667, ans=0.1 2023-12-22 00:20:07,342 INFO [train.py:886] (3/4) Epoch 10, batch 3100, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4948266.83 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:20:19,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=306693.3333333333, ans=0.125 2023-12-22 00:20:21,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=306693.3333333333, ans=0.125 2023-12-22 00:20:21,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=306693.3333333333, ans=0.0 2023-12-22 00:20:34,051 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.376e+01 2.718e+01 2.843e+01 3.043e+01 4.179e+01, threshold=5.686e+01, percent-clipped=0.0 2023-12-22 00:20:58,236 INFO [train.py:886] (3/4) Epoch 10, batch 3150, loss[loss=0.01754, audio_tagging_loss=0.01754, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4946471.85 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:21:06,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2023-12-22 00:21:17,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-12-22 00:21:21,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.82 vs. limit=22.5 2023-12-22 00:21:26,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=307093.3333333333, ans=0.125 2023-12-22 00:21:31,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=307160.0, ans=0.05 2023-12-22 00:21:39,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=307226.6666666667, ans=0.125 2023-12-22 00:21:50,526 INFO [train.py:886] (3/4) Epoch 10, batch 3200, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4945437.43 frames. ], batch size: 99, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:21:51,829 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.544e-03 2023-12-22 00:21:58,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=307293.3333333333, ans=0.125 2023-12-22 00:22:12,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-12-22 00:22:14,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-12-22 00:22:18,054 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.605e+01 2.808e+01 3.019e+01 3.529e+01, threshold=5.615e+01, percent-clipped=0.0 2023-12-22 00:22:21,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=307493.3333333333, ans=0.125 2023-12-22 00:22:35,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=307560.0, ans=0.2 2023-12-22 00:22:42,804 INFO [train.py:886] (3/4) Epoch 10, batch 3250, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4951681.52 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 64.0 2023-12-22 00:22:56,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=307693.3333333333, ans=0.125 2023-12-22 00:23:03,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=307760.0, ans=0.2 2023-12-22 00:23:04,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=307760.0, ans=0.125 2023-12-22 00:23:07,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307760.0, ans=0.125 2023-12-22 00:23:15,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307826.6666666667, ans=0.1 2023-12-22 00:23:19,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=307826.6666666667, ans=0.2 2023-12-22 00:23:24,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=307893.3333333333, ans=0.0 2023-12-22 00:23:34,262 INFO [train.py:886] (3/4) Epoch 10, batch 3300, loss[loss=0.01669, audio_tagging_loss=0.01669, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4955475.02 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:23:36,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=307960.0, ans=0.0 2023-12-22 00:23:47,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-12-22 00:23:49,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=308026.6666666667, ans=0.2 2023-12-22 00:24:01,367 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.643e+01 2.728e+01 2.904e+01 3.479e+01, threshold=5.455e+01, percent-clipped=0.0 2023-12-22 00:24:02,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=308093.3333333333, ans=0.125 2023-12-22 00:24:02,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-22 00:24:03,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=308093.3333333333, ans=0.0 2023-12-22 00:24:26,091 INFO [train.py:886] (3/4) Epoch 10, batch 3350, loss[loss=0.01704, audio_tagging_loss=0.01704, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4961753.32 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:24:51,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=308426.6666666667, ans=0.125 2023-12-22 00:24:53,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308426.6666666667, ans=0.125 2023-12-22 00:25:09,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.40 vs. limit=15.0 2023-12-22 00:25:15,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-12-22 00:25:17,547 INFO [train.py:886] (3/4) Epoch 10, batch 3400, loss[loss=0.01519, audio_tagging_loss=0.01519, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4960615.16 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 128.0 2023-12-22 00:25:29,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=308693.3333333333, ans=0.0 2023-12-22 00:25:32,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308693.3333333333, ans=0.125 2023-12-22 00:25:41,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=308760.0, ans=12.0 2023-12-22 00:25:44,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2023-12-22 00:25:44,950 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.624e+01 2.786e+01 2.955e+01 3.614e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:25:55,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308826.6666666667, ans=0.1 2023-12-22 00:25:57,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=308826.6666666667, ans=0.2 2023-12-22 00:26:02,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=308893.3333333333, ans=0.125 2023-12-22 00:26:09,915 INFO [train.py:886] (3/4) Epoch 10, batch 3450, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24049.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4952769.08 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:26:14,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=308960.0, ans=0.2 2023-12-22 00:26:16,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=308960.0, ans=0.0 2023-12-22 00:26:25,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.09 vs. limit=10.0 2023-12-22 00:26:46,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=309160.0, ans=0.125 2023-12-22 00:26:50,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=309226.6666666667, ans=0.0 2023-12-22 00:26:58,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2023-12-22 00:26:59,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-12-22 00:27:02,319 INFO [train.py:886] (3/4) Epoch 10, batch 3500, loss[loss=0.01701, audio_tagging_loss=0.01701, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4949957.61 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:27:04,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.86 vs. limit=22.5 2023-12-22 00:27:07,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.86 vs. limit=22.5 2023-12-22 00:27:20,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309360.0, ans=0.1 2023-12-22 00:27:23,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=309426.6666666667, ans=0.0 2023-12-22 00:27:26,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=309426.6666666667, ans=0.0 2023-12-22 00:27:30,743 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.637e+01 2.779e+01 3.001e+01 4.121e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 00:27:48,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=309560.0, ans=0.125 2023-12-22 00:27:54,235 INFO [train.py:886] (3/4) Epoch 10, batch 3550, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01507, audio_tagging_loss=0.01507, over 4946010.49 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:27:54,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.39 vs. limit=22.5 2023-12-22 00:27:56,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309626.6666666667, ans=0.1 2023-12-22 00:28:11,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=309693.3333333333, ans=0.2 2023-12-22 00:28:22,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=309760.0, ans=0.125 2023-12-22 00:28:38,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=309893.3333333333, ans=0.125 2023-12-22 00:28:45,835 INFO [train.py:886] (3/4) Epoch 10, batch 3600, loss[loss=0.01657, audio_tagging_loss=0.01657, over 25000.00 frames. ], tot_loss[loss=0.01504, audio_tagging_loss=0.01504, over 4951915.82 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:28:55,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=310026.6666666667, ans=0.2 2023-12-22 00:29:14,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.609e+01 2.736e+01 2.893e+01 3.548e+01, threshold=5.471e+01, percent-clipped=0.0 2023-12-22 00:29:20,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=310160.0, ans=0.0 2023-12-22 00:29:27,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=310226.6666666667, ans=0.5 2023-12-22 00:29:27,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=310226.6666666667, ans=0.0 2023-12-22 00:29:29,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=310226.6666666667, ans=0.035 2023-12-22 00:29:37,948 INFO [train.py:886] (3/4) Epoch 10, batch 3650, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4958565.60 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:29:47,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-12-22 00:30:15,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-12-22 00:30:18,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=310560.0, ans=0.125 2023-12-22 00:30:27,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=310560.0, ans=0.02 2023-12-22 00:30:28,845 INFO [train.py:886] (3/4) Epoch 10, batch 3700, loss[loss=0.01724, audio_tagging_loss=0.01724, over 24750.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4954155.45 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:30:36,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-22 00:30:43,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=310693.3333333333, ans=0.0 2023-12-22 00:30:55,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=310760.0, ans=0.0 2023-12-22 00:30:57,719 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.651e+01 2.800e+01 2.999e+01 3.516e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 00:30:58,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-12-22 00:30:58,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=310760.0, ans=0.07 2023-12-22 00:31:09,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=310826.6666666667, ans=0.0 2023-12-22 00:31:15,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=310893.3333333333, ans=0.2 2023-12-22 00:31:22,332 INFO [train.py:886] (3/4) Epoch 10, batch 3750, loss[loss=0.01688, audio_tagging_loss=0.01688, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4947913.17 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:31:32,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=311026.6666666667, ans=0.035 2023-12-22 00:31:37,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=311026.6666666667, ans=0.0 2023-12-22 00:31:48,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311093.3333333333, ans=0.125 2023-12-22 00:31:55,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=15.0 2023-12-22 00:31:58,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=311160.0, ans=0.2 2023-12-22 00:31:58,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311160.0, ans=0.1 2023-12-22 00:31:59,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=311160.0, ans=0.07 2023-12-22 00:32:13,551 INFO [train.py:886] (3/4) Epoch 10, batch 3800, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4941669.87 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:32:26,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311360.0, ans=0.1 2023-12-22 00:32:30,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-12-22 00:32:36,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=311426.6666666667, ans=0.125 2023-12-22 00:32:41,151 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 2.644e+01 2.770e+01 2.975e+01 3.634e+01, threshold=5.540e+01, percent-clipped=0.0 2023-12-22 00:33:05,079 INFO [train.py:886] (3/4) Epoch 10, batch 3850, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4943004.57 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:33:09,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-22 00:33:11,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=311626.6666666667, ans=0.125 2023-12-22 00:33:13,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=311626.6666666667, ans=0.2 2023-12-22 00:33:16,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=311693.3333333333, ans=0.125 2023-12-22 00:33:26,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=311760.0, ans=0.0 2023-12-22 00:33:58,142 INFO [train.py:886] (3/4) Epoch 10, batch 3900, loss[loss=0.01787, audio_tagging_loss=0.01787, over 24750.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4946547.72 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:05,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=311960.0, ans=0.04949747468305833 2023-12-22 00:34:10,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-22 00:34:17,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-22 00:34:20,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312093.3333333333, ans=0.125 2023-12-22 00:34:25,852 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 2.618e+01 2.786e+01 2.971e+01 3.570e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 00:34:40,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-12-22 00:34:49,000 INFO [train.py:886] (3/4) Epoch 10, batch 3950, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4948198.43 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:34:52,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312293.3333333333, ans=0.0 2023-12-22 00:34:56,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=312293.3333333333, ans=0.125 2023-12-22 00:35:09,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=312426.6666666667, ans=0.0 2023-12-22 00:35:34,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=12.0 2023-12-22 00:35:40,661 INFO [train.py:886] (3/4) Epoch 10, batch 4000, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4954257.10 frames. ], batch size: 100, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:35:56,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=312693.3333333333, ans=0.035 2023-12-22 00:35:59,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-12-22 00:36:01,335 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.522e+00 2023-12-22 00:36:07,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.40 vs. limit=15.0 2023-12-22 00:36:09,003 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.644e+01 2.805e+01 2.924e+01 3.481e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 00:36:14,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=312826.6666666667, ans=0.0 2023-12-22 00:36:26,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2023-12-22 00:36:31,678 INFO [train.py:886] (3/4) Epoch 10, batch 4050, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4951777.74 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:36:47,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=313026.6666666667, ans=0.2 2023-12-22 00:36:48,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=313026.6666666667, ans=0.04949747468305833 2023-12-22 00:36:51,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313093.3333333333, ans=0.1 2023-12-22 00:36:58,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=313093.3333333333, ans=0.07 2023-12-22 00:37:23,796 INFO [train.py:886] (3/4) Epoch 10, batch 4100, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4945716.67 frames. ], batch size: 99, lr: 1.08e-02, grad_scale: 64.0 2023-12-22 00:37:24,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=313293.3333333333, ans=0.025 2023-12-22 00:37:32,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313293.3333333333, ans=0.1 2023-12-22 00:37:38,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=313360.0, ans=0.125 2023-12-22 00:37:51,841 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.742e+01 2.860e+01 3.064e+01 3.458e+01, threshold=5.720e+01, percent-clipped=0.0 2023-12-22 00:37:55,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=313493.3333333333, ans=0.125 2023-12-22 00:38:14,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=313560.0, ans=0.125 2023-12-22 00:38:16,631 INFO [train.py:886] (3/4) Epoch 10, batch 4150, loss[loss=0.0155, audio_tagging_loss=0.0155, over 24750.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 4944871.95 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:38:22,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=313626.6666666667, ans=0.5 2023-12-22 00:38:24,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-12-22 00:38:30,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2023-12-22 00:38:32,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=313693.3333333333, ans=0.0 2023-12-22 00:39:05,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313893.3333333333, ans=0.125 2023-12-22 00:39:07,568 INFO [train.py:886] (3/4) Epoch 10, batch 4200, loss[loss=0.01429, audio_tagging_loss=0.01429, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4944897.45 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:39:13,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=313960.0, ans=0.125 2023-12-22 00:39:22,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=314026.6666666667, ans=0.125 2023-12-22 00:39:35,998 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 2.632e+01 2.766e+01 2.962e+01 3.622e+01, threshold=5.532e+01, percent-clipped=0.0 2023-12-22 00:39:38,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=314160.0, ans=0.0 2023-12-22 00:39:47,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=314160.0, ans=0.125 2023-12-22 00:39:57,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-22 00:39:58,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=314226.6666666667, ans=0.125 2023-12-22 00:40:00,004 INFO [train.py:886] (3/4) Epoch 10, batch 4250, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4944872.32 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:40:00,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=314293.3333333333, ans=0.125 2023-12-22 00:40:13,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=314360.0, ans=0.2 2023-12-22 00:40:15,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=314360.0, ans=0.0 2023-12-22 00:40:18,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=314360.0, ans=0.125 2023-12-22 00:40:22,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=12.0 2023-12-22 00:40:29,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=314426.6666666667, ans=0.95 2023-12-22 00:40:38,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=314493.3333333333, ans=0.125 2023-12-22 00:40:39,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=314493.3333333333, ans=0.025 2023-12-22 00:40:47,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=314560.0, ans=0.125 2023-12-22 00:40:51,770 INFO [train.py:886] (3/4) Epoch 10, batch 4300, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4953384.67 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:40:55,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=314626.6666666667, ans=0.2 2023-12-22 00:41:10,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=314693.3333333333, ans=0.125 2023-12-22 00:41:11,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=314760.0, ans=0.0 2023-12-22 00:41:19,554 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.660e+01 2.835e+01 2.994e+01 3.565e+01, threshold=5.669e+01, percent-clipped=0.0 2023-12-22 00:41:19,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0 2023-12-22 00:41:20,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-22 00:41:21,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=314826.6666666667, ans=0.125 2023-12-22 00:41:43,493 INFO [train.py:886] (3/4) Epoch 10, batch 4350, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4957288.22 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:41:57,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-12-22 00:42:02,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-22 00:42:21,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-22 00:42:23,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=315226.6666666667, ans=0.0 2023-12-22 00:42:27,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=315226.6666666667, ans=0.125 2023-12-22 00:42:35,506 INFO [train.py:886] (3/4) Epoch 10, batch 4400, loss[loss=0.01521, audio_tagging_loss=0.01521, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 4956439.82 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:42:44,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:42:50,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:42:52,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=315360.0, ans=0.0 2023-12-22 00:43:03,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2023-12-22 00:43:04,212 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.662e+01 2.809e+01 2.979e+01 4.012e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 00:43:04,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=315426.6666666667, ans=0.0 2023-12-22 00:43:13,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=315493.3333333333, ans=0.0 2023-12-22 00:43:16,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=315560.0, ans=0.1 2023-12-22 00:43:27,635 INFO [train.py:886] (3/4) Epoch 10, batch 4450, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4951043.65 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:43:29,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=315626.6666666667, ans=0.125 2023-12-22 00:43:32,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-12-22 00:43:35,902 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:43:39,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=315693.3333333333, ans=0.125 2023-12-22 00:43:41,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=315693.3333333333, ans=0.0 2023-12-22 00:43:51,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=315760.0, ans=0.2 2023-12-22 00:44:01,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=315826.6666666667, ans=0.0 2023-12-22 00:44:03,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=315826.6666666667, ans=0.125 2023-12-22 00:44:06,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-12-22 00:44:13,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=315893.3333333333, ans=0.125 2023-12-22 00:44:16,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-12-22 00:44:19,732 INFO [train.py:886] (3/4) Epoch 10, batch 4500, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4948854.97 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:44:28,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=315960.0, ans=0.2 2023-12-22 00:44:30,169 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.284e-02 2023-12-22 00:44:40,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-12-22 00:44:40,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-12-22 00:44:40,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316093.3333333333, ans=0.1 2023-12-22 00:44:42,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=316093.3333333333, ans=0.125 2023-12-22 00:44:47,475 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+01 2.672e+01 2.824e+01 2.974e+01 3.593e+01, threshold=5.647e+01, percent-clipped=0.0 2023-12-22 00:44:47,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=12.0 2023-12-22 00:45:12,138 INFO [train.py:886] (3/4) Epoch 10, batch 4550, loss[loss=0.01742, audio_tagging_loss=0.01742, over 25000.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4946596.74 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:45:17,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=316293.3333333333, ans=0.2 2023-12-22 00:45:19,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2023-12-22 00:45:41,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=316426.6666666667, ans=0.125 2023-12-22 00:45:47,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2023-12-22 00:45:48,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=316493.3333333333, ans=10.0 2023-12-22 00:45:51,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=12.0 2023-12-22 00:46:04,000 INFO [train.py:886] (3/4) Epoch 10, batch 4600, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4948594.68 frames. ], batch size: 100, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:46:05,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-12-22 00:46:16,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=316693.3333333333, ans=0.0 2023-12-22 00:46:16,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316693.3333333333, ans=0.1 2023-12-22 00:46:30,857 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.537e+01 2.757e+01 2.967e+01 3.317e+01, threshold=5.514e+01, percent-clipped=0.0 2023-12-22 00:46:34,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-12-22 00:46:54,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-12-22 00:46:55,675 INFO [train.py:886] (3/4) Epoch 10, batch 4650, loss[loss=0.01601, audio_tagging_loss=0.01601, over 21814.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4952595.35 frames. ], batch size: 107, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:47:03,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-22 00:47:09,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-22 00:47:21,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=317093.3333333333, ans=0.0 2023-12-22 00:47:40,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=317226.6666666667, ans=0.125 2023-12-22 00:47:44,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=317226.6666666667, ans=0.0 2023-12-22 00:47:46,028 INFO [train.py:886] (3/4) Epoch 10, batch 4700, loss[loss=0.0149, audio_tagging_loss=0.0149, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4956237.62 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:48:10,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=317426.6666666667, ans=0.125 2023-12-22 00:48:12,608 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.720e+01 2.864e+01 3.016e+01 3.730e+01, threshold=5.728e+01, percent-clipped=0.0 2023-12-22 00:48:31,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=317560.0, ans=0.125 2023-12-22 00:48:33,401 INFO [train.py:886] (3/4) Epoch 10, batch 4750, loss[loss=0.01682, audio_tagging_loss=0.01682, over 24750.00 frames. ], tot_loss[loss=0.01529, audio_tagging_loss=0.01529, over 4955828.89 frames. ], batch size: 99, lr: 1.07e-02, grad_scale: 64.0 2023-12-22 00:48:35,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-12-22 00:48:36,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=317626.6666666667, ans=0.0 2023-12-22 00:48:40,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317626.6666666667, ans=0.1 2023-12-22 00:49:10,272 INFO [train.py:886] (3/4) Epoch 11, batch 0, loss[loss=0.032, audio_tagging_loss=0.032, over 24038.00 frames. ], tot_loss[loss=0.032, audio_tagging_loss=0.032, over 24038.00 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:49:10,272 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 00:49:23,905 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6551, 2.9069, 3.4538, 3.3658], device='cuda:3') 2023-12-22 00:49:30,815 INFO [train.py:917] (3/4) Epoch 11, validation: loss=0.03405, audio_tagging_loss=0.03405, over 3737520.00 frames. 2023-12-22 00:49:30,816 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 00:49:50,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=317866.6666666667, ans=0.0 2023-12-22 00:49:52,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=317866.6666666667, ans=0.125 2023-12-22 00:49:52,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.70 vs. limit=15.0 2023-12-22 00:49:58,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=15.0 2023-12-22 00:49:59,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=317866.6666666667, ans=0.125 2023-12-22 00:50:05,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317933.3333333333, ans=0.1 2023-12-22 00:50:07,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2023-12-22 00:50:12,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318000.0, ans=0.125 2023-12-22 00:50:22,919 INFO [train.py:886] (3/4) Epoch 11, batch 50, loss[loss=0.02034, audio_tagging_loss=0.02034, over 25000.00 frames. ], tot_loss[loss=0.02327, audio_tagging_loss=0.02327, over 1125316.55 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:50:26,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2023-12-22 00:50:27,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=318066.6666666667, ans=0.2 2023-12-22 00:50:29,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=318066.6666666667, ans=0.125 2023-12-22 00:50:33,997 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.913e+01 3.271e+01 4.041e+01 1.011e+02, threshold=6.542e+01, percent-clipped=6.0 2023-12-22 00:50:40,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=318133.3333333333, ans=6.0 2023-12-22 00:50:44,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-12-22 00:51:04,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=318333.3333333333, ans=0.2 2023-12-22 00:51:14,431 INFO [train.py:886] (3/4) Epoch 11, batch 100, loss[loss=0.01556, audio_tagging_loss=0.01556, over 25000.00 frames. ], tot_loss[loss=0.02042, audio_tagging_loss=0.02042, over 1980542.13 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:00,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318666.6666666667, ans=0.1 2023-12-22 00:52:06,785 INFO [train.py:886] (3/4) Epoch 11, batch 150, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01841, audio_tagging_loss=0.01841, over 2643921.01 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:17,108 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.830e+01 2.997e+01 3.217e+01 3.667e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 00:52:29,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-12-22 00:52:31,624 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 00:52:40,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=318933.3333333333, ans=0.05 2023-12-22 00:52:45,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=318933.3333333333, ans=0.2 2023-12-22 00:52:51,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319000.0, ans=0.1 2023-12-22 00:52:56,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=319000.0, ans=0.125 2023-12-22 00:52:58,336 INFO [train.py:886] (3/4) Epoch 11, batch 200, loss[loss=0.01773, audio_tagging_loss=0.01773, over 25000.00 frames. ], tot_loss[loss=0.01738, audio_tagging_loss=0.01738, over 3163132.97 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:52:58,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319066.6666666667, ans=0.125 2023-12-22 00:53:24,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=319200.0, ans=0.125 2023-12-22 00:53:31,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=319266.6666666667, ans=0.125 2023-12-22 00:53:50,000 INFO [train.py:886] (3/4) Epoch 11, batch 250, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01684, audio_tagging_loss=0.01684, over 3564217.98 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:53:53,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=319400.0, ans=0.125 2023-12-22 00:53:58,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=319400.0, ans=0.0 2023-12-22 00:54:01,135 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.668e+01 2.780e+01 2.958e+01 3.295e+01, threshold=5.560e+01, percent-clipped=0.0 2023-12-22 00:54:06,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-12-22 00:54:11,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319533.3333333333, ans=0.1 2023-12-22 00:54:17,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=319533.3333333333, ans=0.07 2023-12-22 00:54:20,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=319600.0, ans=0.2 2023-12-22 00:54:24,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=319600.0, ans=0.0 2023-12-22 00:54:37,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2023-12-22 00:54:42,154 INFO [train.py:886] (3/4) Epoch 11, batch 300, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 3873234.90 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:54:43,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=319733.3333333333, ans=0.125 2023-12-22 00:55:05,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319866.6666666667, ans=0.1 2023-12-22 00:55:14,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=319933.3333333333, ans=0.2 2023-12-22 00:55:36,162 INFO [train.py:886] (3/4) Epoch 11, batch 350, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01632, audio_tagging_loss=0.01632, over 4108250.15 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:55:36,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-12-22 00:55:39,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-12-22 00:55:42,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.40 vs. limit=22.5 2023-12-22 00:55:43,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=12.0 2023-12-22 00:55:48,028 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.599e+01 2.795e+01 2.968e+01 3.574e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 00:55:54,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=320133.3333333333, ans=0.2 2023-12-22 00:56:01,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320200.0, ans=0.125 2023-12-22 00:56:10,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=320266.6666666667, ans=0.0 2023-12-22 00:56:23,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-12-22 00:56:28,437 INFO [train.py:886] (3/4) Epoch 11, batch 400, loss[loss=0.0147, audio_tagging_loss=0.0147, over 24750.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 4291097.35 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:56:29,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=320400.0, ans=0.0 2023-12-22 00:56:40,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320466.6666666667, ans=0.1 2023-12-22 00:56:49,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=320533.3333333333, ans=0.02 2023-12-22 00:56:58,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=320533.3333333333, ans=0.2 2023-12-22 00:57:02,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-12-22 00:57:16,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=320666.6666666667, ans=0.2 2023-12-22 00:57:20,464 INFO [train.py:886] (3/4) Epoch 11, batch 450, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 4437527.01 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 64.0 2023-12-22 00:57:32,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.613e+01 2.762e+01 2.922e+01 3.563e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 00:57:44,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.70 vs. limit=22.5 2023-12-22 00:57:46,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=320866.6666666667, ans=0.125 2023-12-22 00:58:00,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320933.3333333333, ans=0.125 2023-12-22 00:58:09,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=321000.0, ans=0.0 2023-12-22 00:58:12,052 INFO [train.py:886] (3/4) Epoch 11, batch 500, loss[loss=0.01657, audio_tagging_loss=0.01657, over 25000.00 frames. ], tot_loss[loss=0.01536, audio_tagging_loss=0.01536, over 4557535.84 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:58:19,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=321066.6666666667, ans=0.05 2023-12-22 00:58:40,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=321200.0, ans=0.09899494936611666 2023-12-22 00:59:04,098 INFO [train.py:886] (3/4) Epoch 11, batch 550, loss[loss=0.01579, audio_tagging_loss=0.01579, over 25000.00 frames. ], tot_loss[loss=0.01532, audio_tagging_loss=0.01532, over 4644331.15 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 00:59:10,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 00:59:11,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=321400.0, ans=0.5 2023-12-22 00:59:15,291 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.605e+01 2.796e+01 2.937e+01 3.436e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 00:59:15,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=321466.6666666667, ans=0.015 2023-12-22 00:59:33,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=321533.3333333333, ans=0.2 2023-12-22 00:59:42,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321600.0, ans=0.1 2023-12-22 00:59:45,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=321666.6666666667, ans=0.0 2023-12-22 00:59:50,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=321666.6666666667, ans=0.125 2023-12-22 00:59:55,400 INFO [train.py:886] (3/4) Epoch 11, batch 600, loss[loss=0.01712, audio_tagging_loss=0.01712, over 24750.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4712009.71 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:00:23,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-12-22 01:00:36,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=322000.0, ans=0.0 2023-12-22 01:00:38,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-12-22 01:00:47,580 INFO [train.py:886] (3/4) Epoch 11, batch 650, loss[loss=0.01553, audio_tagging_loss=0.01553, over 22495.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 4754663.60 frames. ], batch size: 107, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:00:58,779 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.635e+01 2.798e+01 2.937e+01 3.276e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 01:01:01,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-12-22 01:01:15,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=322200.0, ans=0.125 2023-12-22 01:01:31,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2023-12-22 01:01:34,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=322333.3333333333, ans=0.0 2023-12-22 01:01:38,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=322400.0, ans=0.125 2023-12-22 01:01:39,233 INFO [train.py:886] (3/4) Epoch 11, batch 700, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01531, audio_tagging_loss=0.01531, over 4796026.65 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:01:45,717 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:02:11,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=322600.0, ans=0.0 2023-12-22 01:02:17,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322600.0, ans=0.1 2023-12-22 01:02:18,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=322600.0, ans=0.125 2023-12-22 01:02:24,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=322666.6666666667, ans=0.125 2023-12-22 01:02:30,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=322733.3333333333, ans=0.0 2023-12-22 01:02:31,626 INFO [train.py:886] (3/4) Epoch 11, batch 750, loss[loss=0.01241, audio_tagging_loss=0.01241, over 21905.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4822025.90 frames. ], batch size: 107, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:02:40,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=322733.3333333333, ans=0.05 2023-12-22 01:02:44,364 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.637e+01 2.770e+01 2.926e+01 3.754e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 01:03:09,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=322933.3333333333, ans=0.0 2023-12-22 01:03:13,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=323000.0, ans=0.2 2023-12-22 01:03:20,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323000.0, ans=0.125 2023-12-22 01:03:21,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.51 vs. limit=15.0 2023-12-22 01:03:24,093 INFO [train.py:886] (3/4) Epoch 11, batch 800, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4852903.37 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:03:24,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=323066.6666666667, ans=0.125 2023-12-22 01:03:36,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=323133.3333333333, ans=0.2 2023-12-22 01:03:49,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=323200.0, ans=0.125 2023-12-22 01:03:53,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=323200.0, ans=0.0 2023-12-22 01:03:55,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=323266.6666666667, ans=0.0 2023-12-22 01:04:01,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.52 vs. limit=10.0 2023-12-22 01:04:02,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=323266.6666666667, ans=0.125 2023-12-22 01:04:15,543 INFO [train.py:886] (3/4) Epoch 11, batch 850, loss[loss=0.01844, audio_tagging_loss=0.01844, over 25000.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4874438.17 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:04:19,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=323400.0, ans=0.0 2023-12-22 01:04:28,273 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.659e+01 2.776e+01 2.937e+01 3.524e+01, threshold=5.552e+01, percent-clipped=0.0 2023-12-22 01:04:54,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=323600.0, ans=0.0 2023-12-22 01:04:59,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.94 vs. limit=22.5 2023-12-22 01:05:00,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-12-22 01:05:01,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=323666.6666666667, ans=0.07 2023-12-22 01:05:02,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=323666.6666666667, ans=0.125 2023-12-22 01:05:05,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2023-12-22 01:05:07,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=323733.3333333333, ans=0.125 2023-12-22 01:05:08,004 INFO [train.py:886] (3/4) Epoch 11, batch 900, loss[loss=0.01809, audio_tagging_loss=0.01809, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4891970.23 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:05:14,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.48 vs. limit=22.5 2023-12-22 01:05:16,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=323800.0, ans=0.05 2023-12-22 01:05:27,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2023-12-22 01:05:32,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=323866.6666666667, ans=0.0 2023-12-22 01:05:36,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=323866.6666666667, ans=0.125 2023-12-22 01:05:57,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=324000.0, ans=0.07 2023-12-22 01:06:00,101 INFO [train.py:886] (3/4) Epoch 11, batch 950, loss[loss=0.01633, audio_tagging_loss=0.01633, over 24750.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4902826.61 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:06:05,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=324066.6666666667, ans=0.0 2023-12-22 01:06:11,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324133.3333333333, ans=0.125 2023-12-22 01:06:12,722 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 2.727e+01 2.870e+01 3.011e+01 3.522e+01, threshold=5.740e+01, percent-clipped=0.0 2023-12-22 01:06:13,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=324133.3333333333, ans=0.0 2023-12-22 01:06:15,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=324133.3333333333, ans=0.125 2023-12-22 01:06:39,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=324266.6666666667, ans=0.1 2023-12-22 01:06:39,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-22 01:06:51,530 INFO [train.py:886] (3/4) Epoch 11, batch 1000, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4911061.27 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:07:01,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2023-12-22 01:07:31,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2023-12-22 01:07:35,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=324666.6666666667, ans=0.0 2023-12-22 01:07:44,399 INFO [train.py:886] (3/4) Epoch 11, batch 1050, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4921335.56 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:07:53,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=324800.0, ans=0.125 2023-12-22 01:07:56,579 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.626e+01 2.739e+01 2.887e+01 3.384e+01, threshold=5.477e+01, percent-clipped=0.0 2023-12-22 01:07:58,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=324800.0, ans=0.1 2023-12-22 01:07:59,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324800.0, ans=0.1 2023-12-22 01:08:14,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=324866.6666666667, ans=0.125 2023-12-22 01:08:17,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=324933.3333333333, ans=0.125 2023-12-22 01:08:24,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=324933.3333333333, ans=0.0 2023-12-22 01:08:28,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325000.0, ans=0.125 2023-12-22 01:08:31,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325000.0, ans=0.125 2023-12-22 01:08:33,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=325000.0, ans=0.2 2023-12-22 01:08:35,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=325066.6666666667, ans=0.125 2023-12-22 01:08:36,757 INFO [train.py:886] (3/4) Epoch 11, batch 1100, loss[loss=0.01627, audio_tagging_loss=0.01627, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4925939.97 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:08:50,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325133.3333333333, ans=0.1 2023-12-22 01:08:59,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=325200.0, ans=0.0 2023-12-22 01:09:02,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-22 01:09:11,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=325266.6666666667, ans=0.035 2023-12-22 01:09:20,390 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.052e-01 2023-12-22 01:09:24,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325333.3333333333, ans=0.125 2023-12-22 01:09:27,738 INFO [train.py:886] (3/4) Epoch 11, batch 1150, loss[loss=0.01046, audio_tagging_loss=0.01046, over 23993.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4938562.33 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:09:34,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-22 01:09:35,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=325400.0, ans=0.09899494936611666 2023-12-22 01:09:38,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=325466.6666666667, ans=0.2 2023-12-22 01:09:41,020 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.635e+01 2.806e+01 2.956e+01 3.723e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 01:09:42,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325466.6666666667, ans=0.125 2023-12-22 01:09:43,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=325466.6666666667, ans=0.0 2023-12-22 01:10:01,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=325600.0, ans=0.125 2023-12-22 01:10:17,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=325666.6666666667, ans=0.125 2023-12-22 01:10:19,964 INFO [train.py:886] (3/4) Epoch 11, batch 1200, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4941124.15 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:10:20,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=325733.3333333333, ans=0.07 2023-12-22 01:11:00,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326000.0, ans=0.0 2023-12-22 01:11:03,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=326000.0, ans=0.0 2023-12-22 01:11:12,397 INFO [train.py:886] (3/4) Epoch 11, batch 1250, loss[loss=0.01312, audio_tagging_loss=0.01312, over 22137.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4937328.51 frames. ], batch size: 107, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:11:13,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.81 vs. limit=22.5 2023-12-22 01:11:15,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326066.6666666667, ans=0.1 2023-12-22 01:11:25,126 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.715e+01 2.889e+01 3.133e+01 4.404e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 01:11:26,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=326133.3333333333, ans=0.125 2023-12-22 01:11:33,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=326200.0, ans=0.95 2023-12-22 01:12:03,784 INFO [train.py:886] (3/4) Epoch 11, batch 1300, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01519, audio_tagging_loss=0.01519, over 4930896.67 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:12:17,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=12.0 2023-12-22 01:12:31,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-22 01:12:36,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326600.0, ans=0.1 2023-12-22 01:12:49,836 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:12:54,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326666.6666666667, ans=0.125 2023-12-22 01:12:56,091 INFO [train.py:886] (3/4) Epoch 11, batch 1350, loss[loss=0.01662, audio_tagging_loss=0.01662, over 24094.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4931881.96 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:13:03,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=12.0 2023-12-22 01:13:04,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=326733.3333333333, ans=0.125 2023-12-22 01:13:07,973 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.643e+01 2.800e+01 2.940e+01 3.448e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:13:12,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=326800.0, ans=0.0 2023-12-22 01:13:20,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-12-22 01:13:25,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.87 vs. limit=10.0 2023-12-22 01:13:46,077 INFO [train.py:886] (3/4) Epoch 11, batch 1400, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4937203.92 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 64.0 2023-12-22 01:13:48,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-22 01:13:55,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=327066.6666666667, ans=0.125 2023-12-22 01:13:57,724 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:14:03,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.94 vs. limit=15.0 2023-12-22 01:14:19,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2023-12-22 01:14:22,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-12-22 01:14:38,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=327400.0, ans=0.2 2023-12-22 01:14:38,941 INFO [train.py:886] (3/4) Epoch 11, batch 1450, loss[loss=0.01623, audio_tagging_loss=0.01623, over 25000.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4944298.98 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:14:40,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=22.5 2023-12-22 01:14:44,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.70 vs. limit=10.0 2023-12-22 01:14:48,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=327466.6666666667, ans=0.0 2023-12-22 01:14:50,217 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.629e+01 2.758e+01 2.923e+01 4.200e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:14:58,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=327533.3333333333, ans=0.125 2023-12-22 01:15:05,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=327533.3333333333, ans=0.125 2023-12-22 01:15:12,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.672e-02 2023-12-22 01:15:13,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-22 01:15:15,150 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.749e-01 2023-12-22 01:15:29,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327733.3333333333, ans=0.125 2023-12-22 01:15:29,686 INFO [train.py:886] (3/4) Epoch 11, batch 1500, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4954941.09 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:15:37,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-22 01:15:59,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=327866.6666666667, ans=0.0 2023-12-22 01:15:59,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=327866.6666666667, ans=0.2 2023-12-22 01:16:16,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328000.0, ans=0.125 2023-12-22 01:16:21,547 INFO [train.py:886] (3/4) Epoch 11, batch 1550, loss[loss=0.01692, audio_tagging_loss=0.01692, over 24947.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4948870.70 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:16:33,806 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.672e+01 2.837e+01 3.019e+01 3.569e+01, threshold=5.673e+01, percent-clipped=0.0 2023-12-22 01:16:37,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328133.3333333333, ans=0.1 2023-12-22 01:16:38,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=328133.3333333333, ans=0.025 2023-12-22 01:16:54,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2023-12-22 01:16:56,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=328266.6666666667, ans=0.015 2023-12-22 01:16:59,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2023-12-22 01:17:04,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328333.3333333333, ans=0.125 2023-12-22 01:17:14,540 INFO [train.py:886] (3/4) Epoch 11, batch 1600, loss[loss=0.01574, audio_tagging_loss=0.01574, over 24750.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4945393.79 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:17:51,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=328600.0, ans=0.0 2023-12-22 01:17:59,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=328666.6666666667, ans=0.5 2023-12-22 01:18:01,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=328666.6666666667, ans=0.0 2023-12-22 01:18:05,219 INFO [train.py:886] (3/4) Epoch 11, batch 1650, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4946731.87 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:18:18,586 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.598e+01 2.768e+01 2.940e+01 3.602e+01, threshold=5.536e+01, percent-clipped=0.0 2023-12-22 01:18:22,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-12-22 01:18:32,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=328866.6666666667, ans=10.0 2023-12-22 01:18:53,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=329000.0, ans=0.0 2023-12-22 01:18:57,178 INFO [train.py:886] (3/4) Epoch 11, batch 1700, loss[loss=0.01657, audio_tagging_loss=0.01657, over 24750.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4952286.64 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:19:10,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2023-12-22 01:19:11,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.71 vs. limit=10.0 2023-12-22 01:19:12,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=329133.3333333333, ans=0.2 2023-12-22 01:19:21,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329200.0, ans=0.1 2023-12-22 01:19:48,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329400.0, ans=0.1 2023-12-22 01:19:49,207 INFO [train.py:886] (3/4) Epoch 11, batch 1750, loss[loss=0.01327, audio_tagging_loss=0.01327, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4957182.21 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:19:58,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=329466.6666666667, ans=0.0 2023-12-22 01:20:01,330 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.366e+01 2.668e+01 2.806e+01 3.006e+01 3.774e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:20:39,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-22 01:20:40,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=329733.3333333333, ans=0.2 2023-12-22 01:20:40,637 INFO [train.py:886] (3/4) Epoch 11, batch 1800, loss[loss=0.01319, audio_tagging_loss=0.01319, over 22050.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4957477.24 frames. ], batch size: 107, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:20:40,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=329733.3333333333, ans=0.2 2023-12-22 01:20:50,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=329800.0, ans=0.125 2023-12-22 01:20:56,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=329800.0, ans=0.0 2023-12-22 01:21:02,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 01:21:31,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330000.0, ans=0.1 2023-12-22 01:21:33,105 INFO [train.py:886] (3/4) Epoch 11, batch 1850, loss[loss=0.01634, audio_tagging_loss=0.01634, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4953801.98 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:21:35,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=330066.6666666667, ans=0.0 2023-12-22 01:21:36,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330066.6666666667, ans=0.125 2023-12-22 01:21:45,282 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.664e+01 2.766e+01 2.942e+01 3.434e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:21:47,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=330133.3333333333, ans=0.04949747468305833 2023-12-22 01:21:51,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.07 vs. limit=22.5 2023-12-22 01:21:58,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=330200.0, ans=0.125 2023-12-22 01:22:00,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=330200.0, ans=0.125 2023-12-22 01:22:01,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=330200.0, ans=0.0 2023-12-22 01:22:01,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-12-22 01:22:24,764 INFO [train.py:886] (3/4) Epoch 11, batch 1900, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01506, audio_tagging_loss=0.01506, over 4942282.38 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:22:43,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=330466.6666666667, ans=0.0 2023-12-22 01:22:55,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=330600.0, ans=0.05 2023-12-22 01:23:02,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=330600.0, ans=0.0 2023-12-22 01:23:08,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.26 vs. limit=8.0 2023-12-22 01:23:10,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2023-12-22 01:23:16,956 INFO [train.py:886] (3/4) Epoch 11, batch 1950, loss[loss=0.01818, audio_tagging_loss=0.01818, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4938182.70 frames. ], batch size: 99, lr: 1.00e-02, grad_scale: 64.0 2023-12-22 01:23:24,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=330733.3333333333, ans=0.05 2023-12-22 01:23:29,047 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.806e+01 2.987e+01 3.356e+01, threshold=5.613e+01, percent-clipped=0.0 2023-12-22 01:23:30,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.07 vs. limit=22.5 2023-12-22 01:23:43,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330866.6666666667, ans=0.1 2023-12-22 01:23:44,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330866.6666666667, ans=0.1 2023-12-22 01:24:09,397 INFO [train.py:886] (3/4) Epoch 11, batch 2000, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4938200.12 frames. ], batch size: 99, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:24:26,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-22 01:24:45,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=331266.6666666667, ans=0.125 2023-12-22 01:24:48,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=331266.6666666667, ans=0.0 2023-12-22 01:24:52,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2023-12-22 01:24:54,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=331333.3333333333, ans=0.2 2023-12-22 01:24:56,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2023-12-22 01:25:00,493 INFO [train.py:886] (3/4) Epoch 11, batch 2050, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4944031.32 frames. ], batch size: 100, lr: 9.99e-03, grad_scale: 64.0 2023-12-22 01:25:13,298 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.569e+01 2.759e+01 2.903e+01 3.847e+01, threshold=5.517e+01, percent-clipped=0.0 2023-12-22 01:25:29,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2023-12-22 01:25:33,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=331600.0, ans=0.0 2023-12-22 01:25:39,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=331600.0, ans=0.125 2023-12-22 01:25:45,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331666.6666666667, ans=0.125 2023-12-22 01:25:53,373 INFO [train.py:886] (3/4) Epoch 11, batch 2100, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4953299.34 frames. ], batch size: 100, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:26:09,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-12-22 01:26:10,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-22 01:26:14,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=331866.6666666667, ans=0.2 2023-12-22 01:26:21,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2023-12-22 01:26:27,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=331933.3333333333, ans=0.015 2023-12-22 01:26:28,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.21 vs. limit=10.0 2023-12-22 01:26:43,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332000.0, ans=0.125 2023-12-22 01:26:43,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=332000.0, ans=0.1 2023-12-22 01:26:44,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=332066.6666666667, ans=0.125 2023-12-22 01:26:45,349 INFO [train.py:886] (3/4) Epoch 11, batch 2150, loss[loss=0.01345, audio_tagging_loss=0.01345, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4950875.73 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:26:58,038 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.658e+01 2.791e+01 2.942e+01 3.654e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 01:26:59,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332133.3333333333, ans=0.125 2023-12-22 01:26:59,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332133.3333333333, ans=0.1 2023-12-22 01:27:02,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332133.3333333333, ans=0.1 2023-12-22 01:27:05,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=332200.0, ans=0.125 2023-12-22 01:27:08,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=332200.0, ans=0.125 2023-12-22 01:27:11,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=332200.0, ans=0.09899494936611666 2023-12-22 01:27:14,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=332200.0, ans=0.0 2023-12-22 01:27:15,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-22 01:27:17,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-12-22 01:27:25,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=332266.6666666667, ans=0.0 2023-12-22 01:27:29,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332333.3333333333, ans=0.125 2023-12-22 01:27:37,463 INFO [train.py:886] (3/4) Epoch 11, batch 2200, loss[loss=0.01729, audio_tagging_loss=0.01729, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4947647.60 frames. ], batch size: 99, lr: 9.98e-03, grad_scale: 64.0 2023-12-22 01:27:41,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=332400.0, ans=0.125 2023-12-22 01:27:44,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=332400.0, ans=0.125 2023-12-22 01:27:46,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-22 01:28:22,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.87 vs. limit=5.0 2023-12-22 01:28:29,515 INFO [train.py:886] (3/4) Epoch 11, batch 2250, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4944113.32 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:28:34,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2023-12-22 01:28:35,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=332733.3333333333, ans=0.0 2023-12-22 01:28:41,537 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.628e+01 2.793e+01 2.962e+01 3.343e+01, threshold=5.585e+01, percent-clipped=0.0 2023-12-22 01:28:47,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=332800.0, ans=0.125 2023-12-22 01:28:51,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=332866.6666666667, ans=0.125 2023-12-22 01:29:08,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.04 vs. limit=22.5 2023-12-22 01:29:09,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=333000.0, ans=10.0 2023-12-22 01:29:15,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=333000.0, ans=0.0 2023-12-22 01:29:21,162 INFO [train.py:886] (3/4) Epoch 11, batch 2300, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4945889.90 frames. ], batch size: 99, lr: 9.97e-03, grad_scale: 64.0 2023-12-22 01:29:33,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=333133.3333333333, ans=0.125 2023-12-22 01:30:12,973 INFO [train.py:886] (3/4) Epoch 11, batch 2350, loss[loss=0.01681, audio_tagging_loss=0.01681, over 22220.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4942110.47 frames. ], batch size: 107, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:30:16,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2023-12-22 01:30:17,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=333400.0, ans=0.125 2023-12-22 01:30:25,751 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 2.666e+01 2.800e+01 2.976e+01 3.525e+01, threshold=5.600e+01, percent-clipped=0.0 2023-12-22 01:30:27,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333466.6666666667, ans=0.125 2023-12-22 01:30:28,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=333466.6666666667, ans=0.125 2023-12-22 01:30:52,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-12-22 01:30:57,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=333666.6666666667, ans=0.125 2023-12-22 01:31:02,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=333666.6666666667, ans=0.0 2023-12-22 01:31:02,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.06 vs. limit=22.5 2023-12-22 01:31:05,347 INFO [train.py:886] (3/4) Epoch 11, batch 2400, loss[loss=0.01522, audio_tagging_loss=0.01522, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4941128.86 frames. ], batch size: 100, lr: 9.96e-03, grad_scale: 64.0 2023-12-22 01:31:17,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=333800.0, ans=0.125 2023-12-22 01:31:34,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=333866.6666666667, ans=0.0 2023-12-22 01:31:37,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=333933.3333333333, ans=0.0 2023-12-22 01:31:51,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-12-22 01:31:56,639 INFO [train.py:886] (3/4) Epoch 11, batch 2450, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4944788.25 frames. ], batch size: 100, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:31:59,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=334066.6666666667, ans=0.0 2023-12-22 01:32:01,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=334066.6666666667, ans=0.125 2023-12-22 01:32:09,952 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.647e+01 2.778e+01 2.930e+01 3.813e+01, threshold=5.556e+01, percent-clipped=0.0 2023-12-22 01:32:19,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-12-22 01:32:24,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=334200.0, ans=0.125 2023-12-22 01:32:29,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=334266.6666666667, ans=0.0 2023-12-22 01:32:29,686 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:32:30,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-12-22 01:32:37,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=334333.3333333333, ans=0.125 2023-12-22 01:32:39,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-12-22 01:32:49,591 INFO [train.py:886] (3/4) Epoch 11, batch 2500, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4942855.30 frames. ], batch size: 99, lr: 9.95e-03, grad_scale: 64.0 2023-12-22 01:32:49,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-12-22 01:32:51,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334400.0, ans=0.0 2023-12-22 01:32:53,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=334400.0, ans=0.0 2023-12-22 01:32:53,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=334400.0, ans=0.125 2023-12-22 01:33:10,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=334533.3333333333, ans=0.0 2023-12-22 01:33:17,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=334533.3333333333, ans=0.125 2023-12-22 01:33:22,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334600.0, ans=0.0 2023-12-22 01:33:25,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=334600.0, ans=0.0 2023-12-22 01:33:28,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=334600.0, ans=0.125 2023-12-22 01:33:41,391 INFO [train.py:886] (3/4) Epoch 11, batch 2550, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4940986.24 frames. ], batch size: 99, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:33:54,292 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.667e+01 2.808e+01 2.946e+01 3.351e+01, threshold=5.616e+01, percent-clipped=0.0 2023-12-22 01:33:57,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.13 vs. limit=22.5 2023-12-22 01:34:13,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=334933.3333333333, ans=0.1 2023-12-22 01:34:14,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=22.5 2023-12-22 01:34:33,207 INFO [train.py:886] (3/4) Epoch 11, batch 2600, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4944935.80 frames. ], batch size: 100, lr: 9.94e-03, grad_scale: 64.0 2023-12-22 01:35:05,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=335266.6666666667, ans=0.125 2023-12-22 01:35:06,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=335266.6666666667, ans=0.95 2023-12-22 01:35:25,030 INFO [train.py:886] (3/4) Epoch 11, batch 2650, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4951794.20 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 64.0 2023-12-22 01:35:29,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=335400.0, ans=0.07 2023-12-22 01:35:36,557 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.628e+01 2.801e+01 2.924e+01 4.214e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 01:35:44,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2023-12-22 01:35:49,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-12-22 01:35:51,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-12-22 01:36:08,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=335666.6666666667, ans=0.125 2023-12-22 01:36:10,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=335666.6666666667, ans=0.0 2023-12-22 01:36:15,982 INFO [train.py:886] (3/4) Epoch 11, batch 2700, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4957004.73 frames. ], batch size: 100, lr: 9.93e-03, grad_scale: 128.0 2023-12-22 01:36:22,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-12-22 01:36:48,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-12-22 01:36:49,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335933.3333333333, ans=0.125 2023-12-22 01:37:08,121 INFO [train.py:886] (3/4) Epoch 11, batch 2750, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4962972.20 frames. ], batch size: 100, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:37:10,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=336066.6666666667, ans=0.125 2023-12-22 01:37:21,137 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.681e+01 2.814e+01 2.995e+01 3.460e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 01:37:24,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=336133.3333333333, ans=0.125 2023-12-22 01:37:37,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=336200.0, ans=0.0 2023-12-22 01:37:47,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=336266.6666666667, ans=0.2 2023-12-22 01:37:52,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=336333.3333333333, ans=0.125 2023-12-22 01:37:59,151 INFO [train.py:886] (3/4) Epoch 11, batch 2800, loss[loss=0.01424, audio_tagging_loss=0.01424, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4960782.15 frames. ], batch size: 99, lr: 9.92e-03, grad_scale: 64.0 2023-12-22 01:37:59,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336400.0, ans=0.125 2023-12-22 01:38:13,719 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:38:31,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-12-22 01:38:37,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=336600.0, ans=0.125 2023-12-22 01:38:41,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=336666.6666666667, ans=0.0 2023-12-22 01:38:43,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=336666.6666666667, ans=0.1 2023-12-22 01:38:52,009 INFO [train.py:886] (3/4) Epoch 11, batch 2850, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4954437.33 frames. ], batch size: 99, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:38:53,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=336733.3333333333, ans=0.05 2023-12-22 01:38:59,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=336733.3333333333, ans=0.0 2023-12-22 01:39:05,710 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.677e+01 2.844e+01 3.022e+01 3.570e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 01:39:13,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=336866.6666666667, ans=0.5 2023-12-22 01:39:14,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=336866.6666666667, ans=0.125 2023-12-22 01:39:37,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-12-22 01:39:40,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337000.0, ans=0.125 2023-12-22 01:39:45,141 INFO [train.py:886] (3/4) Epoch 11, batch 2900, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 4945812.82 frames. ], batch size: 99, lr: 9.91e-03, grad_scale: 64.0 2023-12-22 01:39:49,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=337066.6666666667, ans=0.1 2023-12-22 01:39:49,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=337066.6666666667, ans=0.125 2023-12-22 01:39:58,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=337133.3333333333, ans=6.0 2023-12-22 01:40:04,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=337200.0, ans=0.0 2023-12-22 01:40:20,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=337266.6666666667, ans=0.0 2023-12-22 01:40:36,306 INFO [train.py:886] (3/4) Epoch 11, batch 2950, loss[loss=0.01641, audio_tagging_loss=0.01641, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4948503.05 frames. ], batch size: 100, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:40:48,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=337466.6666666667, ans=0.07 2023-12-22 01:40:50,443 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.642e+01 2.776e+01 2.943e+01 5.115e+01, threshold=5.551e+01, percent-clipped=0.0 2023-12-22 01:40:57,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=337533.3333333333, ans=0.05 2023-12-22 01:41:08,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=337600.0, ans=0.2 2023-12-22 01:41:08,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=337600.0, ans=0.125 2023-12-22 01:41:09,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.67 vs. limit=22.5 2023-12-22 01:41:14,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=337600.0, ans=0.0 2023-12-22 01:41:28,790 INFO [train.py:886] (3/4) Epoch 11, batch 3000, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4951355.98 frames. ], batch size: 100, lr: 9.90e-03, grad_scale: 64.0 2023-12-22 01:41:28,790 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 01:41:49,992 INFO [train.py:917] (3/4) Epoch 11, validation: loss=0.03489, audio_tagging_loss=0.03489, over 3737520.00 frames. 2023-12-22 01:41:49,993 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 01:42:00,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=337800.0, ans=0.015 2023-12-22 01:42:01,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=337800.0, ans=0.125 2023-12-22 01:42:01,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337800.0, ans=0.1 2023-12-22 01:42:14,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=337866.6666666667, ans=0.125 2023-12-22 01:42:18,697 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 01:42:19,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=337933.3333333333, ans=0.125 2023-12-22 01:42:20,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2023-12-22 01:42:28,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=337933.3333333333, ans=0.1 2023-12-22 01:42:36,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338000.0, ans=0.125 2023-12-22 01:42:41,458 INFO [train.py:886] (3/4) Epoch 11, batch 3050, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4956856.39 frames. ], batch size: 99, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:42:45,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=338066.6666666667, ans=0.0 2023-12-22 01:42:55,340 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.694e+01 2.782e+01 2.959e+01 3.737e+01, threshold=5.564e+01, percent-clipped=0.0 2023-12-22 01:43:05,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=338200.0, ans=0.0 2023-12-22 01:43:12,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=338266.6666666667, ans=0.5 2023-12-22 01:43:18,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=338266.6666666667, ans=0.125 2023-12-22 01:43:26,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338333.3333333333, ans=0.125 2023-12-22 01:43:27,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=338333.3333333333, ans=0.04949747468305833 2023-12-22 01:43:33,636 INFO [train.py:886] (3/4) Epoch 11, batch 3100, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4960603.09 frames. ], batch size: 100, lr: 9.89e-03, grad_scale: 64.0 2023-12-22 01:43:35,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-12-22 01:43:43,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338466.6666666667, ans=0.1 2023-12-22 01:43:46,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=338466.6666666667, ans=0.125 2023-12-22 01:43:55,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=338533.3333333333, ans=0.0 2023-12-22 01:44:07,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338600.0, ans=0.125 2023-12-22 01:44:12,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=338600.0, ans=0.125 2023-12-22 01:44:15,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=338666.6666666667, ans=0.125 2023-12-22 01:44:19,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-12-22 01:44:19,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=338666.6666666667, ans=0.5 2023-12-22 01:44:24,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-22 01:44:24,956 INFO [train.py:886] (3/4) Epoch 11, batch 3150, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4958780.97 frames. ], batch size: 99, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:44:33,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=338733.3333333333, ans=0.125 2023-12-22 01:44:33,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-12-22 01:44:38,752 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.710e+01 2.835e+01 3.003e+01 4.249e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 01:44:48,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=338866.6666666667, ans=0.1 2023-12-22 01:44:58,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=338933.3333333333, ans=0.0 2023-12-22 01:44:58,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.55 vs. limit=22.5 2023-12-22 01:45:17,425 INFO [train.py:886] (3/4) Epoch 11, batch 3200, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4953918.48 frames. ], batch size: 99, lr: 9.88e-03, grad_scale: 64.0 2023-12-22 01:45:21,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.30 vs. limit=22.5 2023-12-22 01:45:23,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=339066.6666666667, ans=0.07 2023-12-22 01:45:23,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=339066.6666666667, ans=0.125 2023-12-22 01:45:23,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.74 vs. limit=22.5 2023-12-22 01:45:26,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=339133.3333333333, ans=0.125 2023-12-22 01:45:34,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=339133.3333333333, ans=0.125 2023-12-22 01:45:42,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=339200.0, ans=0.125 2023-12-22 01:45:42,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=339200.0, ans=0.0 2023-12-22 01:45:43,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=339200.0, ans=0.07 2023-12-22 01:45:50,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=339266.6666666667, ans=0.125 2023-12-22 01:45:53,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2023-12-22 01:45:53,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=339266.6666666667, ans=0.0 2023-12-22 01:46:00,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=339333.3333333333, ans=0.0 2023-12-22 01:46:07,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=339333.3333333333, ans=0.125 2023-12-22 01:46:09,463 INFO [train.py:886] (3/4) Epoch 11, batch 3250, loss[loss=0.01467, audio_tagging_loss=0.01467, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4955712.99 frames. ], batch size: 99, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:46:16,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=339400.0, ans=0.125 2023-12-22 01:46:16,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339400.0, ans=0.1 2023-12-22 01:46:23,058 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.400e+01 2.628e+01 2.776e+01 2.963e+01 3.661e+01, threshold=5.553e+01, percent-clipped=0.0 2023-12-22 01:46:32,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=339533.3333333333, ans=0.125 2023-12-22 01:46:46,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=339600.0, ans=0.0 2023-12-22 01:46:55,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-12-22 01:47:01,036 INFO [train.py:886] (3/4) Epoch 11, batch 3300, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4954976.36 frames. ], batch size: 99, lr: 9.87e-03, grad_scale: 64.0 2023-12-22 01:47:02,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=339733.3333333333, ans=0.125 2023-12-22 01:47:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=339733.3333333333, ans=0.125 2023-12-22 01:47:05,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=22.5 2023-12-22 01:47:14,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=339800.0, ans=0.125 2023-12-22 01:47:48,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=340000.0, ans=0.05 2023-12-22 01:47:52,827 INFO [train.py:886] (3/4) Epoch 11, batch 3350, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4950194.92 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:48:06,419 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.626e+01 2.784e+01 2.981e+01 3.630e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 01:48:14,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=340200.0, ans=0.125 2023-12-22 01:48:16,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2023-12-22 01:48:39,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=12.0 2023-12-22 01:48:45,041 INFO [train.py:886] (3/4) Epoch 11, batch 3400, loss[loss=0.01781, audio_tagging_loss=0.01781, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4952324.47 frames. ], batch size: 100, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:48:55,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=340466.6666666667, ans=0.0 2023-12-22 01:48:58,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=340466.6666666667, ans=0.0 2023-12-22 01:49:05,899 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.314e-01 2023-12-22 01:49:11,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-12-22 01:49:11,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-22 01:49:14,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=340533.3333333333, ans=0.125 2023-12-22 01:49:30,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=340666.6666666667, ans=0.125 2023-12-22 01:49:36,291 INFO [train.py:886] (3/4) Epoch 11, batch 3450, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4951424.65 frames. ], batch size: 99, lr: 9.86e-03, grad_scale: 64.0 2023-12-22 01:49:39,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340733.3333333333, ans=0.0 2023-12-22 01:49:49,460 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.658e+01 2.786e+01 2.914e+01 3.460e+01, threshold=5.572e+01, percent-clipped=0.0 2023-12-22 01:50:06,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=340933.3333333333, ans=10.0 2023-12-22 01:50:15,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=340933.3333333333, ans=0.2 2023-12-22 01:50:23,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341000.0, ans=0.125 2023-12-22 01:50:27,740 INFO [train.py:886] (3/4) Epoch 11, batch 3500, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4946930.95 frames. ], batch size: 100, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:50:37,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=341133.3333333333, ans=0.0 2023-12-22 01:50:37,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=341133.3333333333, ans=0.125 2023-12-22 01:50:40,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=341133.3333333333, ans=0.0 2023-12-22 01:50:54,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-12-22 01:51:01,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=341266.6666666667, ans=0.125 2023-12-22 01:51:10,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-12-22 01:51:11,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-12-22 01:51:18,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=341333.3333333333, ans=0.125 2023-12-22 01:51:20,406 INFO [train.py:886] (3/4) Epoch 11, batch 3550, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4948349.36 frames. ], batch size: 100, lr: 9.85e-03, grad_scale: 64.0 2023-12-22 01:51:27,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=341400.0, ans=0.125 2023-12-22 01:51:30,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=341466.6666666667, ans=0.2 2023-12-22 01:51:32,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=341466.6666666667, ans=0.2 2023-12-22 01:51:32,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=341466.6666666667, ans=0.09899494936611666 2023-12-22 01:51:33,441 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.655e+01 2.795e+01 3.045e+01 3.842e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 01:52:02,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341666.6666666667, ans=0.1 2023-12-22 01:52:03,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=341666.6666666667, ans=0.125 2023-12-22 01:52:04,811 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.363e-01 2023-12-22 01:52:11,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=341733.3333333333, ans=0.125 2023-12-22 01:52:12,057 INFO [train.py:886] (3/4) Epoch 11, batch 3600, loss[loss=0.01577, audio_tagging_loss=0.01577, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4947528.12 frames. ], batch size: 99, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:52:18,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=341733.3333333333, ans=0.125 2023-12-22 01:52:21,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=341800.0, ans=0.125 2023-12-22 01:52:32,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341866.6666666667, ans=0.125 2023-12-22 01:52:36,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=341866.6666666667, ans=0.0 2023-12-22 01:52:51,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=341933.3333333333, ans=0.125 2023-12-22 01:52:51,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341933.3333333333, ans=0.125 2023-12-22 01:53:03,725 INFO [train.py:886] (3/4) Epoch 11, batch 3650, loss[loss=0.01473, audio_tagging_loss=0.01473, over 22533.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4953225.91 frames. ], batch size: 107, lr: 9.84e-03, grad_scale: 64.0 2023-12-22 01:53:12,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=342066.6666666667, ans=0.125 2023-12-22 01:53:13,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=342133.3333333333, ans=0.125 2023-12-22 01:53:13,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=342133.3333333333, ans=0.2 2023-12-22 01:53:13,306 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.537e-03 2023-12-22 01:53:17,498 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.641e+01 2.779e+01 2.904e+01 3.423e+01, threshold=5.558e+01, percent-clipped=0.0 2023-12-22 01:53:33,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=342200.0, ans=0.125 2023-12-22 01:53:34,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=342266.6666666667, ans=0.0 2023-12-22 01:53:55,474 INFO [train.py:886] (3/4) Epoch 11, batch 3700, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4954542.47 frames. ], batch size: 100, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:53:56,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2023-12-22 01:54:01,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-12-22 01:54:02,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342400.0, ans=0.125 2023-12-22 01:54:05,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=342466.6666666667, ans=0.125 2023-12-22 01:54:08,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=342466.6666666667, ans=0.125 2023-12-22 01:54:32,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=342600.0, ans=0.125 2023-12-22 01:54:46,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-12-22 01:54:48,113 INFO [train.py:886] (3/4) Epoch 11, batch 3750, loss[loss=0.01417, audio_tagging_loss=0.01417, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4946151.07 frames. ], batch size: 99, lr: 9.83e-03, grad_scale: 64.0 2023-12-22 01:54:50,234 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.464e-02 2023-12-22 01:54:51,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-12-22 01:55:01,195 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.750e+01 2.894e+01 3.049e+01 3.560e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 01:55:12,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342866.6666666667, ans=0.0 2023-12-22 01:55:22,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-12-22 01:55:24,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=342933.3333333333, ans=0.125 2023-12-22 01:55:32,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343000.0, ans=0.0 2023-12-22 01:55:37,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=343000.0, ans=0.0 2023-12-22 01:55:39,607 INFO [train.py:886] (3/4) Epoch 11, batch 3800, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01508, audio_tagging_loss=0.01508, over 4948319.50 frames. ], batch size: 99, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:55:46,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=343066.6666666667, ans=0.125 2023-12-22 01:55:51,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 01:56:04,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=343200.0, ans=0.125 2023-12-22 01:56:31,192 INFO [train.py:886] (3/4) Epoch 11, batch 3850, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4941001.18 frames. ], batch size: 100, lr: 9.82e-03, grad_scale: 64.0 2023-12-22 01:56:44,873 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 2.640e+01 2.758e+01 2.966e+01 3.475e+01, threshold=5.515e+01, percent-clipped=0.0 2023-12-22 01:56:57,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343533.3333333333, ans=0.125 2023-12-22 01:56:59,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2023-12-22 01:57:06,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=343600.0, ans=0.0 2023-12-22 01:57:12,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=12.0 2023-12-22 01:57:23,759 INFO [train.py:886] (3/4) Epoch 11, batch 3900, loss[loss=0.01709, audio_tagging_loss=0.01709, over 22500.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4947260.45 frames. ], batch size: 107, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:57:26,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=343733.3333333333, ans=0.07 2023-12-22 01:57:35,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=343800.0, ans=0.2 2023-12-22 01:57:40,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343800.0, ans=0.1 2023-12-22 01:57:46,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343866.6666666667, ans=0.125 2023-12-22 01:57:49,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-12-22 01:58:08,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=344000.0, ans=0.125 2023-12-22 01:58:10,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=344000.0, ans=0.0 2023-12-22 01:58:15,467 INFO [train.py:886] (3/4) Epoch 11, batch 3950, loss[loss=0.0153, audio_tagging_loss=0.0153, over 24058.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4951270.36 frames. ], batch size: 100, lr: 9.81e-03, grad_scale: 64.0 2023-12-22 01:58:29,726 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.678e+01 2.766e+01 2.913e+01 3.341e+01, threshold=5.531e+01, percent-clipped=0.0 2023-12-22 01:58:32,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-22 01:58:39,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=344200.0, ans=0.0 2023-12-22 01:58:46,089 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.440e-03 2023-12-22 01:58:48,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=344266.6666666667, ans=0.125 2023-12-22 01:59:06,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344333.3333333333, ans=0.1 2023-12-22 01:59:08,118 INFO [train.py:886] (3/4) Epoch 11, batch 4000, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4953707.33 frames. ], batch size: 100, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 01:59:09,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=344400.0, ans=0.125 2023-12-22 02:00:00,084 INFO [train.py:886] (3/4) Epoch 11, batch 4050, loss[loss=0.01529, audio_tagging_loss=0.01529, over 24750.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4955039.08 frames. ], batch size: 99, lr: 9.80e-03, grad_scale: 64.0 2023-12-22 02:00:03,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344733.3333333333, ans=0.1 2023-12-22 02:00:07,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344733.3333333333, ans=0.0 2023-12-22 02:00:13,199 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.657e+01 2.838e+01 3.013e+01 3.411e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:00:35,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=344933.3333333333, ans=0.2 2023-12-22 02:00:46,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2023-12-22 02:00:51,611 INFO [train.py:886] (3/4) Epoch 11, batch 4100, loss[loss=0.02032, audio_tagging_loss=0.02032, over 24948.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4949013.71 frames. ], batch size: 100, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:00:51,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=345066.6666666667, ans=0.125 2023-12-22 02:01:06,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=345133.3333333333, ans=0.125 2023-12-22 02:01:06,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=345133.3333333333, ans=0.0 2023-12-22 02:01:17,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-22 02:01:41,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=345333.3333333333, ans=0.04949747468305833 2023-12-22 02:01:44,239 INFO [train.py:886] (3/4) Epoch 11, batch 4150, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4948854.51 frames. ], batch size: 99, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:01:49,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-12-22 02:01:57,136 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.652e+01 2.805e+01 3.053e+01 3.563e+01, threshold=5.610e+01, percent-clipped=0.0 2023-12-22 02:02:07,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=345533.3333333333, ans=0.0 2023-12-22 02:02:34,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2023-12-22 02:02:35,108 INFO [train.py:886] (3/4) Epoch 11, batch 4200, loss[loss=0.01933, audio_tagging_loss=0.01933, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4950335.54 frames. ], batch size: 100, lr: 9.79e-03, grad_scale: 64.0 2023-12-22 02:02:38,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=345733.3333333333, ans=0.0 2023-12-22 02:02:45,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2023-12-22 02:03:27,987 INFO [train.py:886] (3/4) Epoch 11, batch 4250, loss[loss=0.0151, audio_tagging_loss=0.0151, over 25000.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4952212.34 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:03:40,220 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.635e+01 2.806e+01 3.008e+01 3.451e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:03:49,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=346200.0, ans=0.125 2023-12-22 02:03:58,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=346266.6666666667, ans=0.125 2023-12-22 02:04:01,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=346266.6666666667, ans=0.2 2023-12-22 02:04:16,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=346333.3333333333, ans=0.125 2023-12-22 02:04:17,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346400.0, ans=0.1 2023-12-22 02:04:18,647 INFO [train.py:886] (3/4) Epoch 11, batch 4300, loss[loss=0.01545, audio_tagging_loss=0.01545, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4952099.09 frames. ], batch size: 100, lr: 9.78e-03, grad_scale: 64.0 2023-12-22 02:04:29,753 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:04:29,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.69 vs. limit=22.5 2023-12-22 02:04:34,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346466.6666666667, ans=0.0 2023-12-22 02:04:39,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=346533.3333333333, ans=0.125 2023-12-22 02:05:03,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=346666.6666666667, ans=22.5 2023-12-22 02:05:13,317 INFO [train.py:886] (3/4) Epoch 11, batch 4350, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.01494, audio_tagging_loss=0.01494, over 4951380.55 frames. ], batch size: 99, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:05:26,215 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.594e+01 2.762e+01 2.910e+01 3.539e+01, threshold=5.524e+01, percent-clipped=0.0 2023-12-22 02:05:28,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-12-22 02:05:43,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=346866.6666666667, ans=0.2 2023-12-22 02:05:45,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-12-22 02:06:01,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347000.0, ans=0.1 2023-12-22 02:06:04,906 INFO [train.py:886] (3/4) Epoch 11, batch 4400, loss[loss=0.01598, audio_tagging_loss=0.01598, over 24750.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4950196.71 frames. ], batch size: 99, lr: 9.77e-03, grad_scale: 64.0 2023-12-22 02:06:07,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=347066.6666666667, ans=0.0 2023-12-22 02:06:13,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=347066.6666666667, ans=0.125 2023-12-22 02:06:14,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=347133.3333333333, ans=0.0 2023-12-22 02:06:26,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=347200.0, ans=0.125 2023-12-22 02:06:33,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-12-22 02:06:43,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=347266.6666666667, ans=0.125 2023-12-22 02:06:44,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347266.6666666667, ans=0.125 2023-12-22 02:06:44,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-12-22 02:06:57,387 INFO [train.py:886] (3/4) Epoch 11, batch 4450, loss[loss=0.01712, audio_tagging_loss=0.01712, over 24750.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 4951417.47 frames. ], batch size: 99, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:07:10,412 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.637e+01 2.813e+01 2.949e+01 3.771e+01, threshold=5.626e+01, percent-clipped=0.0 2023-12-22 02:07:10,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-12-22 02:07:20,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2023-12-22 02:07:27,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2023-12-22 02:07:35,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=347600.0, ans=0.0 2023-12-22 02:07:41,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=347666.6666666667, ans=0.125 2023-12-22 02:07:44,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=347666.6666666667, ans=0.125 2023-12-22 02:07:49,030 INFO [train.py:886] (3/4) Epoch 11, batch 4500, loss[loss=0.01463, audio_tagging_loss=0.01463, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4951872.42 frames. ], batch size: 100, lr: 9.76e-03, grad_scale: 64.0 2023-12-22 02:08:36,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348000.0, ans=0.1 2023-12-22 02:08:41,326 INFO [train.py:886] (3/4) Epoch 11, batch 4550, loss[loss=0.0165, audio_tagging_loss=0.0165, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4954557.02 frames. ], batch size: 100, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:08:44,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-12-22 02:08:55,124 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.595e+01 2.747e+01 2.922e+01 3.602e+01, threshold=5.493e+01, percent-clipped=0.0 2023-12-22 02:09:04,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-12-22 02:09:24,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=348333.3333333333, ans=0.125 2023-12-22 02:09:33,449 INFO [train.py:886] (3/4) Epoch 11, batch 4600, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4951491.21 frames. ], batch size: 100, lr: 9.75e-03, grad_scale: 64.0 2023-12-22 02:09:39,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.87 vs. limit=15.0 2023-12-22 02:09:41,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2023-12-22 02:09:48,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-12-22 02:10:04,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-12-22 02:10:20,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=12.0 2023-12-22 02:10:25,560 INFO [train.py:886] (3/4) Epoch 11, batch 4650, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4957304.08 frames. ], batch size: 100, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:10:25,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=348733.3333333333, ans=0.125 2023-12-22 02:10:38,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-12-22 02:10:39,507 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.626e+01 2.815e+01 2.915e+01 3.579e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 02:10:51,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=348866.6666666667, ans=0.025 2023-12-22 02:11:01,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=348933.3333333333, ans=0.125 2023-12-22 02:11:04,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=348933.3333333333, ans=0.125 2023-12-22 02:11:15,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=349000.0, ans=0.0 2023-12-22 02:11:17,319 INFO [train.py:886] (3/4) Epoch 11, batch 4700, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4948079.36 frames. ], batch size: 99, lr: 9.74e-03, grad_scale: 64.0 2023-12-22 02:11:22,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=349066.6666666667, ans=0.125 2023-12-22 02:11:34,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=349133.3333333333, ans=0.125 2023-12-22 02:11:38,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-22 02:11:50,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=349266.6666666667, ans=0.125 2023-12-22 02:12:00,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=349333.3333333333, ans=0.125 2023-12-22 02:12:04,806 INFO [train.py:886] (3/4) Epoch 11, batch 4750, loss[loss=0.01616, audio_tagging_loss=0.01616, over 24750.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4946504.91 frames. ], batch size: 99, lr: 9.73e-03, grad_scale: 64.0 2023-12-22 02:12:07,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-22 02:12:17,674 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 2.685e+01 2.810e+01 2.973e+01 3.471e+01, threshold=5.619e+01, percent-clipped=0.0 2023-12-22 02:12:40,994 INFO [train.py:886] (3/4) Epoch 12, batch 0, loss[loss=0.03228, audio_tagging_loss=0.03228, over 23989.00 frames. ], tot_loss[loss=0.03228, audio_tagging_loss=0.03228, over 23989.00 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:12:40,995 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 02:12:54,748 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([0.9778, 2.0049, 2.7308, 2.8219], device='cuda:3') 2023-12-22 02:13:02,308 INFO [train.py:917] (3/4) Epoch 12, validation: loss=0.03393, audio_tagging_loss=0.03393, over 3737520.00 frames. 2023-12-22 02:13:02,309 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 02:13:39,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349706.6666666667, ans=0.125 2023-12-22 02:13:40,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=349706.6666666667, ans=0.125 2023-12-22 02:13:52,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=349840.0, ans=0.125 2023-12-22 02:13:53,152 INFO [train.py:886] (3/4) Epoch 12, batch 50, loss[loss=0.01936, audio_tagging_loss=0.01936, over 25000.00 frames. ], tot_loss[loss=0.02344, audio_tagging_loss=0.02344, over 1113929.23 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:13:57,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=349840.0, ans=0.2 2023-12-22 02:14:07,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=349906.6666666667, ans=0.125 2023-12-22 02:14:08,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=349906.6666666667, ans=0.09899494936611666 2023-12-22 02:14:22,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=349973.3333333333, ans=0.125 2023-12-22 02:14:23,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=350040.0, ans=0.0 2023-12-22 02:14:39,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-22 02:14:42,744 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 3.132e+01 3.484e+01 4.021e+01 8.947e+01, threshold=6.968e+01, percent-clipped=8.0 2023-12-22 02:14:45,344 INFO [train.py:886] (3/4) Epoch 12, batch 100, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.02031, audio_tagging_loss=0.02031, over 1972122.96 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 64.0 2023-12-22 02:14:47,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=350173.3333333333, ans=0.5 2023-12-22 02:14:53,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=350173.3333333333, ans=0.1 2023-12-22 02:14:58,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350240.0, ans=0.1 2023-12-22 02:15:02,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=350240.0, ans=0.2 2023-12-22 02:15:27,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=350440.0, ans=0.125 2023-12-22 02:15:36,295 INFO [train.py:886] (3/4) Epoch 12, batch 150, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01845, audio_tagging_loss=0.01845, over 2637715.52 frames. ], batch size: 100, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:16:02,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=350640.0, ans=0.125 2023-12-22 02:16:15,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=350706.6666666667, ans=0.125 2023-12-22 02:16:21,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-12-22 02:16:27,251 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.700e+01 2.881e+01 3.001e+01 3.518e+01, threshold=5.761e+01, percent-clipped=0.0 2023-12-22 02:16:29,884 INFO [train.py:886] (3/4) Epoch 12, batch 200, loss[loss=0.01816, audio_tagging_loss=0.01816, over 25000.00 frames. ], tot_loss[loss=0.01725, audio_tagging_loss=0.01725, over 3157086.02 frames. ], batch size: 100, lr: 9.31e-03, grad_scale: 64.0 2023-12-22 02:16:37,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=350840.0, ans=0.0 2023-12-22 02:16:40,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:16:42,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=350906.6666666667, ans=0.125 2023-12-22 02:16:46,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=350906.6666666667, ans=0.0 2023-12-22 02:17:10,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=351106.6666666667, ans=0.0 2023-12-22 02:17:20,970 INFO [train.py:886] (3/4) Epoch 12, batch 250, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 3559172.24 frames. ], batch size: 100, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:17:23,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-12-22 02:17:24,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351173.3333333333, ans=0.1 2023-12-22 02:17:36,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-12-22 02:17:36,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-12-22 02:17:53,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=351373.3333333333, ans=0.0 2023-12-22 02:17:53,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=351373.3333333333, ans=0.0 2023-12-22 02:18:00,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=351373.3333333333, ans=0.125 2023-12-22 02:18:08,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.33 vs. limit=22.5 2023-12-22 02:18:10,375 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.670e+01 2.789e+01 2.930e+01 3.431e+01, threshold=5.578e+01, percent-clipped=0.0 2023-12-22 02:18:12,281 INFO [train.py:886] (3/4) Epoch 12, batch 300, loss[loss=0.0163, audio_tagging_loss=0.0163, over 24750.00 frames. ], tot_loss[loss=0.01622, audio_tagging_loss=0.01622, over 3870832.22 frames. ], batch size: 99, lr: 9.30e-03, grad_scale: 64.0 2023-12-22 02:18:12,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=351506.6666666667, ans=22.5 2023-12-22 02:18:18,158 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:18:25,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=351573.3333333333, ans=0.125 2023-12-22 02:18:51,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=351706.6666666667, ans=0.125 2023-12-22 02:18:56,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2023-12-22 02:18:58,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=351773.3333333333, ans=0.02 2023-12-22 02:19:01,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351773.3333333333, ans=0.1 2023-12-22 02:19:03,709 INFO [train.py:886] (3/4) Epoch 12, batch 350, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 4105656.74 frames. ], batch size: 99, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:19:03,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=351840.0, ans=0.0 2023-12-22 02:19:12,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2023-12-22 02:19:17,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-12-22 02:19:45,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-12-22 02:19:50,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=352106.6666666667, ans=0.125 2023-12-22 02:19:52,869 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.603e+01 2.805e+01 2.915e+01 3.693e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 02:19:55,521 INFO [train.py:886] (3/4) Epoch 12, batch 400, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.0156, audio_tagging_loss=0.0156, over 4288840.17 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:19:55,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=352173.3333333333, ans=0.125 2023-12-22 02:19:56,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=352173.3333333333, ans=0.04949747468305833 2023-12-22 02:19:58,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=352173.3333333333, ans=0.2 2023-12-22 02:20:11,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352240.0, ans=0.1 2023-12-22 02:20:44,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=352440.0, ans=0.125 2023-12-22 02:20:48,156 INFO [train.py:886] (3/4) Epoch 12, batch 450, loss[loss=0.01665, audio_tagging_loss=0.01665, over 25000.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 4439110.88 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 64.0 2023-12-22 02:20:52,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=352506.6666666667, ans=0.0 2023-12-22 02:21:01,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=12.0 2023-12-22 02:21:26,453 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.227e-01 2023-12-22 02:21:37,257 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.603e+01 2.721e+01 2.857e+01 3.643e+01, threshold=5.441e+01, percent-clipped=0.0 2023-12-22 02:21:39,860 INFO [train.py:886] (3/4) Epoch 12, batch 500, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4555066.40 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:21:58,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=352906.6666666667, ans=0.0 2023-12-22 02:22:04,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352973.3333333333, ans=0.1 2023-12-22 02:22:12,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.46 vs. limit=22.5 2023-12-22 02:22:31,415 INFO [train.py:886] (3/4) Epoch 12, batch 550, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4642443.85 frames. ], batch size: 100, lr: 9.28e-03, grad_scale: 64.0 2023-12-22 02:22:36,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353173.3333333333, ans=0.1 2023-12-22 02:23:04,191 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:23:06,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-12-22 02:23:20,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=353440.0, ans=0.0 2023-12-22 02:23:21,418 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.657e+01 2.754e+01 2.932e+01 3.860e+01, threshold=5.508e+01, percent-clipped=0.0 2023-12-22 02:23:23,350 INFO [train.py:886] (3/4) Epoch 12, batch 600, loss[loss=0.01485, audio_tagging_loss=0.01485, over 23984.00 frames. ], tot_loss[loss=0.01513, audio_tagging_loss=0.01513, over 4706104.42 frames. ], batch size: 100, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:23:42,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=353573.3333333333, ans=0.09899494936611666 2023-12-22 02:23:45,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=353640.0, ans=0.5 2023-12-22 02:23:55,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2023-12-22 02:24:06,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353773.3333333333, ans=0.1 2023-12-22 02:24:15,638 INFO [train.py:886] (3/4) Epoch 12, batch 650, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 4749616.75 frames. ], batch size: 99, lr: 9.27e-03, grad_scale: 64.0 2023-12-22 02:24:26,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=353906.6666666667, ans=0.125 2023-12-22 02:24:31,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-22 02:24:33,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=353906.6666666667, ans=0.125 2023-12-22 02:24:38,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353973.3333333333, ans=0.1 2023-12-22 02:24:45,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=353973.3333333333, ans=0.02 2023-12-22 02:24:47,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=354040.0, ans=0.125 2023-12-22 02:24:48,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=354040.0, ans=0.0 2023-12-22 02:24:52,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=354040.0, ans=0.015 2023-12-22 02:24:59,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2023-12-22 02:25:05,086 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.660e+01 2.826e+01 2.982e+01 3.638e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 02:25:07,025 INFO [train.py:886] (3/4) Epoch 12, batch 700, loss[loss=0.01802, audio_tagging_loss=0.01802, over 24750.00 frames. ], tot_loss[loss=0.01509, audio_tagging_loss=0.01509, over 4796379.21 frames. ], batch size: 99, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:25:12,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=354173.3333333333, ans=0.0 2023-12-22 02:25:15,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=354173.3333333333, ans=0.05 2023-12-22 02:25:20,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=354240.0, ans=0.125 2023-12-22 02:25:28,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:29,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:32,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=354306.6666666667, ans=0.125 2023-12-22 02:25:34,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2023-12-22 02:25:37,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-12-22 02:25:47,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=354373.3333333333, ans=15.0 2023-12-22 02:25:49,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=354440.0, ans=0.125 2023-12-22 02:25:59,238 INFO [train.py:886] (3/4) Epoch 12, batch 750, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 4828172.69 frames. ], batch size: 99, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:02,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354506.6666666667, ans=0.125 2023-12-22 02:26:16,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=354573.3333333333, ans=0.0 2023-12-22 02:26:21,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-22 02:26:22,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=354640.0, ans=0.125 2023-12-22 02:26:25,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-12-22 02:26:34,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=354706.6666666667, ans=0.125 2023-12-22 02:26:38,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=354706.6666666667, ans=0.125 2023-12-22 02:26:39,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-12-22 02:26:44,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=12.0 2023-12-22 02:26:45,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=354773.3333333333, ans=0.0 2023-12-22 02:26:47,761 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.606e+01 2.795e+01 2.916e+01 3.346e+01, threshold=5.591e+01, percent-clipped=0.0 2023-12-22 02:26:50,422 INFO [train.py:886] (3/4) Epoch 12, batch 800, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4859298.99 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 64.0 2023-12-22 02:26:50,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=354840.0, ans=0.125 2023-12-22 02:26:53,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354840.0, ans=0.1 2023-12-22 02:26:58,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=354840.0, ans=0.0 2023-12-22 02:27:18,206 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:27:26,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-12-22 02:27:27,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=355040.0, ans=0.125 2023-12-22 02:27:27,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=355040.0, ans=0.95 2023-12-22 02:27:28,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355040.0, ans=0.125 2023-12-22 02:27:40,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=355106.6666666667, ans=0.125 2023-12-22 02:27:42,025 INFO [train.py:886] (3/4) Epoch 12, batch 850, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4878067.10 frames. ], batch size: 100, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:27:42,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.18 vs. limit=22.5 2023-12-22 02:28:14,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355373.3333333333, ans=0.1 2023-12-22 02:28:32,635 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.693e+01 2.795e+01 2.929e+01 3.864e+01, threshold=5.590e+01, percent-clipped=0.0 2023-12-22 02:28:34,551 INFO [train.py:886] (3/4) Epoch 12, batch 900, loss[loss=0.01819, audio_tagging_loss=0.01819, over 25000.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4897086.99 frames. ], batch size: 100, lr: 9.25e-03, grad_scale: 64.0 2023-12-22 02:28:43,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=355506.6666666667, ans=0.05 2023-12-22 02:28:44,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=355573.3333333333, ans=0.125 2023-12-22 02:28:46,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355573.3333333333, ans=0.1 2023-12-22 02:28:58,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=355640.0, ans=0.125 2023-12-22 02:29:07,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=355706.6666666667, ans=0.125 2023-12-22 02:29:26,314 INFO [train.py:886] (3/4) Epoch 12, batch 950, loss[loss=0.01735, audio_tagging_loss=0.01735, over 24750.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 4899052.12 frames. ], batch size: 99, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:29:36,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=355906.6666666667, ans=0.2 2023-12-22 02:29:48,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-12-22 02:29:58,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=356040.0, ans=0.0 2023-12-22 02:30:06,232 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.778e-01 2023-12-22 02:30:09,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=356106.6666666667, ans=0.2 2023-12-22 02:30:16,916 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.679e+01 2.792e+01 2.958e+01 3.407e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 02:30:18,813 INFO [train.py:886] (3/4) Epoch 12, batch 1000, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4908634.85 frames. ], batch size: 99, lr: 9.24e-03, grad_scale: 64.0 2023-12-22 02:30:21,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=356173.3333333333, ans=0.125 2023-12-22 02:30:36,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=356240.0, ans=0.0 2023-12-22 02:30:38,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=356306.6666666667, ans=0.035 2023-12-22 02:30:52,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-22 02:31:03,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=356440.0, ans=0.0 2023-12-22 02:31:10,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356506.6666666667, ans=0.1 2023-12-22 02:31:10,658 INFO [train.py:886] (3/4) Epoch 12, batch 1050, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4917339.06 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:31:12,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=356506.6666666667, ans=0.04949747468305833 2023-12-22 02:31:14,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=356506.6666666667, ans=0.125 2023-12-22 02:31:35,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-22 02:31:49,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=356706.6666666667, ans=0.125 2023-12-22 02:31:56,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=356773.3333333333, ans=0.125 2023-12-22 02:32:00,446 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.676e+01 2.799e+01 2.940e+01 3.260e+01, threshold=5.599e+01, percent-clipped=0.0 2023-12-22 02:32:02,365 INFO [train.py:886] (3/4) Epoch 12, batch 1100, loss[loss=0.01429, audio_tagging_loss=0.01429, over 23031.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4923642.51 frames. ], batch size: 107, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:32:02,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2023-12-22 02:32:08,378 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 02:32:11,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=356906.6666666667, ans=0.0 2023-12-22 02:32:14,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356906.6666666667, ans=0.1 2023-12-22 02:32:32,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356973.3333333333, ans=0.1 2023-12-22 02:32:36,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357040.0, ans=0.125 2023-12-22 02:32:46,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=357106.6666666667, ans=0.2 2023-12-22 02:32:54,160 INFO [train.py:886] (3/4) Epoch 12, batch 1150, loss[loss=0.01751, audio_tagging_loss=0.01751, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4935635.43 frames. ], batch size: 100, lr: 9.23e-03, grad_scale: 64.0 2023-12-22 02:32:55,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-12-22 02:32:58,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=357173.3333333333, ans=0.0 2023-12-22 02:33:02,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.71 vs. limit=22.5 2023-12-22 02:33:10,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.14 vs. limit=22.5 2023-12-22 02:33:13,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=357306.6666666667, ans=0.125 2023-12-22 02:33:15,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=357306.6666666667, ans=0.2 2023-12-22 02:33:16,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=357306.6666666667, ans=0.125 2023-12-22 02:33:20,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=357306.6666666667, ans=0.125 2023-12-22 02:33:21,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=357306.6666666667, ans=0.125 2023-12-22 02:33:22,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.82 vs. limit=22.5 2023-12-22 02:33:24,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=357373.3333333333, ans=0.0 2023-12-22 02:33:35,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=357440.0, ans=0.125 2023-12-22 02:33:40,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=357440.0, ans=0.125 2023-12-22 02:33:44,375 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.648e+01 2.751e+01 2.936e+01 3.446e+01, threshold=5.502e+01, percent-clipped=0.0 2023-12-22 02:33:46,302 INFO [train.py:886] (3/4) Epoch 12, batch 1200, loss[loss=0.01532, audio_tagging_loss=0.01532, over 25000.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4937600.66 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:33:47,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=357506.6666666667, ans=0.125 2023-12-22 02:33:49,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=10.0 2023-12-22 02:33:51,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-12-22 02:33:59,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=357573.3333333333, ans=0.125 2023-12-22 02:34:01,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=357573.3333333333, ans=0.0 2023-12-22 02:34:01,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=357573.3333333333, ans=0.125 2023-12-22 02:34:02,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=357573.3333333333, ans=0.04949747468305833 2023-12-22 02:34:26,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=357706.6666666667, ans=0.015 2023-12-22 02:34:38,777 INFO [train.py:886] (3/4) Epoch 12, batch 1250, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24012.00 frames. ], tot_loss[loss=0.01503, audio_tagging_loss=0.01503, over 4937186.62 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 64.0 2023-12-22 02:34:44,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-22 02:34:46,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2023-12-22 02:34:56,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=15.0 2023-12-22 02:34:57,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=357906.6666666667, ans=0.0 2023-12-22 02:35:02,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=357973.3333333333, ans=0.2 2023-12-22 02:35:05,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357973.3333333333, ans=0.1 2023-12-22 02:35:28,216 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.720e+01 2.848e+01 2.999e+01 3.607e+01, threshold=5.697e+01, percent-clipped=0.0 2023-12-22 02:35:29,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=358173.3333333333, ans=0.125 2023-12-22 02:35:30,137 INFO [train.py:886] (3/4) Epoch 12, batch 1300, loss[loss=0.01605, audio_tagging_loss=0.01605, over 24036.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4935805.47 frames. ], batch size: 100, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:35:43,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358240.0, ans=0.1 2023-12-22 02:36:22,379 INFO [train.py:886] (3/4) Epoch 12, batch 1350, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 4943151.27 frames. ], batch size: 99, lr: 9.21e-03, grad_scale: 64.0 2023-12-22 02:36:56,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=358706.6666666667, ans=0.0 2023-12-22 02:37:12,420 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.686e+01 2.846e+01 3.039e+01 3.537e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 02:37:14,362 INFO [train.py:886] (3/4) Epoch 12, batch 1400, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4950494.16 frames. ], batch size: 100, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:37:27,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=358906.6666666667, ans=0.125 2023-12-22 02:37:31,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=358906.6666666667, ans=0.0 2023-12-22 02:37:33,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=358973.3333333333, ans=0.0 2023-12-22 02:37:33,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=358973.3333333333, ans=0.07 2023-12-22 02:37:36,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=358973.3333333333, ans=0.2 2023-12-22 02:37:45,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=359040.0, ans=0.125 2023-12-22 02:37:54,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=359106.6666666667, ans=0.125 2023-12-22 02:37:58,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=359106.6666666667, ans=0.125 2023-12-22 02:38:03,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=359173.3333333333, ans=0.0 2023-12-22 02:38:04,761 INFO [train.py:886] (3/4) Epoch 12, batch 1450, loss[loss=0.01478, audio_tagging_loss=0.01478, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4951662.52 frames. ], batch size: 100, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:38:17,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-12-22 02:38:31,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 02:38:42,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2023-12-22 02:38:44,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=359373.3333333333, ans=0.07 2023-12-22 02:38:54,578 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.648e+01 2.789e+01 2.949e+01 3.520e+01, threshold=5.579e+01, percent-clipped=0.0 2023-12-22 02:38:56,515 INFO [train.py:886] (3/4) Epoch 12, batch 1500, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4955960.51 frames. ], batch size: 100, lr: 9.20e-03, grad_scale: 64.0 2023-12-22 02:39:37,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=359706.6666666667, ans=0.125 2023-12-22 02:39:38,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=359773.3333333333, ans=0.0 2023-12-22 02:39:50,012 INFO [train.py:886] (3/4) Epoch 12, batch 1550, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01496, audio_tagging_loss=0.01496, over 4950109.79 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:39:53,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-12-22 02:40:00,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359906.6666666667, ans=0.1 2023-12-22 02:40:09,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=359973.3333333333, ans=0.0 2023-12-22 02:40:10,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=359973.3333333333, ans=0.125 2023-12-22 02:40:15,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359973.3333333333, ans=0.1 2023-12-22 02:40:17,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-12-22 02:40:39,640 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.713e+01 2.839e+01 3.034e+01 3.584e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 02:40:41,601 INFO [train.py:886] (3/4) Epoch 12, batch 1600, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01501, audio_tagging_loss=0.01501, over 4946505.55 frames. ], batch size: 99, lr: 9.19e-03, grad_scale: 64.0 2023-12-22 02:41:12,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=360373.3333333333, ans=0.0 2023-12-22 02:41:16,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-22 02:41:27,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=360440.0, ans=0.1 2023-12-22 02:41:32,826 INFO [train.py:886] (3/4) Epoch 12, batch 1650, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 4943631.41 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:41:39,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=360506.6666666667, ans=0.025 2023-12-22 02:42:12,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=360706.6666666667, ans=0.0 2023-12-22 02:42:17,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=360773.3333333333, ans=0.1 2023-12-22 02:42:22,670 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.644e+01 2.817e+01 2.973e+01 3.664e+01, threshold=5.633e+01, percent-clipped=0.0 2023-12-22 02:42:25,262 INFO [train.py:886] (3/4) Epoch 12, batch 1700, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4939961.42 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:42:31,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=360840.0, ans=0.0 2023-12-22 02:42:31,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=360840.0, ans=0.125 2023-12-22 02:42:33,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=360840.0, ans=0.07 2023-12-22 02:42:45,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-12-22 02:42:48,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=360973.3333333333, ans=0.1 2023-12-22 02:42:50,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-12-22 02:42:53,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=12.0 2023-12-22 02:42:58,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=361040.0, ans=0.04949747468305833 2023-12-22 02:43:03,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.00 vs. limit=12.0 2023-12-22 02:43:05,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=361106.6666666667, ans=0.2 2023-12-22 02:43:13,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=361106.6666666667, ans=0.2 2023-12-22 02:43:16,375 INFO [train.py:886] (3/4) Epoch 12, batch 1750, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4938952.61 frames. ], batch size: 100, lr: 9.18e-03, grad_scale: 64.0 2023-12-22 02:43:24,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361173.3333333333, ans=0.1 2023-12-22 02:43:24,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=361173.3333333333, ans=0.0 2023-12-22 02:43:34,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-12-22 02:43:40,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=361306.6666666667, ans=0.0 2023-12-22 02:43:46,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.26 vs. limit=22.5 2023-12-22 02:44:06,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=361440.0, ans=0.2 2023-12-22 02:44:07,342 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.657e+01 2.802e+01 2.979e+01 3.545e+01, threshold=5.603e+01, percent-clipped=0.0 2023-12-22 02:44:09,256 INFO [train.py:886] (3/4) Epoch 12, batch 1800, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4946243.25 frames. ], batch size: 100, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:44:38,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=361640.0, ans=0.0 2023-12-22 02:45:00,405 INFO [train.py:886] (3/4) Epoch 12, batch 1850, loss[loss=0.01827, audio_tagging_loss=0.01827, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4946398.75 frames. ], batch size: 99, lr: 9.17e-03, grad_scale: 64.0 2023-12-22 02:45:04,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=361840.0, ans=0.125 2023-12-22 02:45:15,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.68 vs. limit=22.5 2023-12-22 02:45:22,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=361973.3333333333, ans=0.125 2023-12-22 02:45:23,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361973.3333333333, ans=0.1 2023-12-22 02:45:40,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=362040.0, ans=0.0 2023-12-22 02:45:48,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=362106.6666666667, ans=10.0 2023-12-22 02:45:50,662 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.408e+01 2.739e+01 2.901e+01 3.056e+01 4.118e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 02:45:53,269 INFO [train.py:886] (3/4) Epoch 12, batch 1900, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4939712.73 frames. ], batch size: 99, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:45:57,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-12-22 02:46:02,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=362240.0, ans=0.125 2023-12-22 02:46:15,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=362306.6666666667, ans=0.2 2023-12-22 02:46:17,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2023-12-22 02:46:18,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=22.5 2023-12-22 02:46:26,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362373.3333333333, ans=0.125 2023-12-22 02:46:27,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=362373.3333333333, ans=0.2 2023-12-22 02:46:34,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=362440.0, ans=0.2 2023-12-22 02:46:45,593 INFO [train.py:886] (3/4) Epoch 12, batch 1950, loss[loss=0.01473, audio_tagging_loss=0.01473, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4935501.99 frames. ], batch size: 100, lr: 9.16e-03, grad_scale: 64.0 2023-12-22 02:47:25,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=362773.3333333333, ans=0.0 2023-12-22 02:47:29,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=362773.3333333333, ans=0.125 2023-12-22 02:47:33,918 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.661e+01 2.847e+01 3.018e+01 3.786e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 02:47:35,870 INFO [train.py:886] (3/4) Epoch 12, batch 2000, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4942955.68 frames. ], batch size: 100, lr: 9.16e-03, grad_scale: 128.0 2023-12-22 02:47:57,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=362973.3333333333, ans=0.0 2023-12-22 02:47:58,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362973.3333333333, ans=0.125 2023-12-22 02:48:02,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=362973.3333333333, ans=0.025 2023-12-22 02:48:06,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=363040.0, ans=0.2 2023-12-22 02:48:06,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=363040.0, ans=0.125 2023-12-22 02:48:15,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=363040.0, ans=0.125 2023-12-22 02:48:19,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=363106.6666666667, ans=0.0 2023-12-22 02:48:26,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-22 02:48:28,423 INFO [train.py:886] (3/4) Epoch 12, batch 2050, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4944775.57 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:48:54,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=363306.6666666667, ans=0.0 2023-12-22 02:49:15,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-22 02:49:17,499 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.662e+01 2.833e+01 2.958e+01 3.467e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 02:49:18,479 INFO [train.py:886] (3/4) Epoch 12, batch 2100, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4945984.06 frames. ], batch size: 100, lr: 9.15e-03, grad_scale: 64.0 2023-12-22 02:49:36,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=363573.3333333333, ans=0.0 2023-12-22 02:49:46,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=363640.0, ans=0.2 2023-12-22 02:49:50,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=363706.6666666667, ans=0.0 2023-12-22 02:50:00,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-12-22 02:50:03,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-22 02:50:05,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363773.3333333333, ans=0.125 2023-12-22 02:50:11,415 INFO [train.py:886] (3/4) Epoch 12, batch 2150, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4945662.46 frames. ], batch size: 100, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:50:20,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=363906.6666666667, ans=0.125 2023-12-22 02:50:22,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=363906.6666666667, ans=0.125 2023-12-22 02:50:33,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=363973.3333333333, ans=0.0 2023-12-22 02:50:53,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-22 02:50:59,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=364106.6666666667, ans=0.2 2023-12-22 02:51:01,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2023-12-22 02:51:02,676 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.698e+01 2.881e+01 3.027e+01 3.417e+01, threshold=5.762e+01, percent-clipped=0.0 2023-12-22 02:51:03,651 INFO [train.py:886] (3/4) Epoch 12, batch 2200, loss[loss=0.01194, audio_tagging_loss=0.01194, over 24043.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4941549.63 frames. ], batch size: 100, lr: 9.14e-03, grad_scale: 64.0 2023-12-22 02:51:16,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364240.0, ans=0.0 2023-12-22 02:51:27,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=364306.6666666667, ans=0.04949747468305833 2023-12-22 02:51:28,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=364306.6666666667, ans=0.125 2023-12-22 02:51:41,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=364373.3333333333, ans=0.125 2023-12-22 02:51:55,235 INFO [train.py:886] (3/4) Epoch 12, batch 2250, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4941933.91 frames. ], batch size: 99, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:22,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.66 vs. limit=22.5 2023-12-22 02:52:23,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364640.0, ans=0.1 2023-12-22 02:52:36,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=364773.3333333333, ans=0.125 2023-12-22 02:52:46,425 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.658e+01 2.783e+01 2.953e+01 5.165e+01, threshold=5.566e+01, percent-clipped=0.0 2023-12-22 02:52:47,410 INFO [train.py:886] (3/4) Epoch 12, batch 2300, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4946320.50 frames. ], batch size: 99, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:52:54,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=364840.0, ans=0.0 2023-12-22 02:52:56,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=364906.6666666667, ans=0.2 2023-12-22 02:53:08,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=364973.3333333333, ans=0.0 2023-12-22 02:53:12,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364973.3333333333, ans=0.1 2023-12-22 02:53:30,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2023-12-22 02:53:39,769 INFO [train.py:886] (3/4) Epoch 12, batch 2350, loss[loss=0.01442, audio_tagging_loss=0.01442, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4940700.20 frames. ], batch size: 100, lr: 9.13e-03, grad_scale: 64.0 2023-12-22 02:53:46,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=365173.3333333333, ans=0.0 2023-12-22 02:53:54,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2023-12-22 02:54:05,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=365306.6666666667, ans=0.025 2023-12-22 02:54:18,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=365373.3333333333, ans=0.09899494936611666 2023-12-22 02:54:30,883 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.667e+01 2.832e+01 3.022e+01 3.621e+01, threshold=5.664e+01, percent-clipped=0.0 2023-12-22 02:54:31,875 INFO [train.py:886] (3/4) Epoch 12, batch 2400, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4945242.27 frames. ], batch size: 100, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:54:36,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-12-22 02:54:57,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=365640.0, ans=0.125 2023-12-22 02:55:03,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=365706.6666666667, ans=0.0 2023-12-22 02:55:03,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2023-12-22 02:55:11,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=365773.3333333333, ans=0.125 2023-12-22 02:55:23,537 INFO [train.py:886] (3/4) Epoch 12, batch 2450, loss[loss=0.01696, audio_tagging_loss=0.01696, over 24750.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4949770.13 frames. ], batch size: 99, lr: 9.12e-03, grad_scale: 64.0 2023-12-22 02:55:23,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=365840.0, ans=10.0 2023-12-22 02:55:39,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-22 02:55:45,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365973.3333333333, ans=0.1 2023-12-22 02:55:51,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=365973.3333333333, ans=0.2 2023-12-22 02:55:57,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=366040.0, ans=0.0 2023-12-22 02:55:57,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=22.5 2023-12-22 02:56:14,275 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.716e+01 2.829e+01 2.975e+01 3.465e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 02:56:15,284 INFO [train.py:886] (3/4) Epoch 12, batch 2500, loss[loss=0.01398, audio_tagging_loss=0.01398, over 22130.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4951979.70 frames. ], batch size: 107, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:56:34,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=366306.6666666667, ans=0.125 2023-12-22 02:56:46,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366373.3333333333, ans=0.1 2023-12-22 02:56:49,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=366373.3333333333, ans=0.125 2023-12-22 02:56:53,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366373.3333333333, ans=0.1 2023-12-22 02:56:54,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=366373.3333333333, ans=0.125 2023-12-22 02:57:06,187 INFO [train.py:886] (3/4) Epoch 12, batch 2550, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4949334.04 frames. ], batch size: 100, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:57:22,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366573.3333333333, ans=0.1 2023-12-22 02:57:29,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 02:57:47,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-22 02:57:56,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.648e+01 2.817e+01 3.039e+01 3.435e+01, threshold=5.634e+01, percent-clipped=0.0 2023-12-22 02:57:57,904 INFO [train.py:886] (3/4) Epoch 12, batch 2600, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4951214.99 frames. ], batch size: 100, lr: 9.11e-03, grad_scale: 64.0 2023-12-22 02:57:58,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=366840.0, ans=0.125 2023-12-22 02:58:00,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=366840.0, ans=0.0 2023-12-22 02:58:01,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=366840.0, ans=0.04949747468305833 2023-12-22 02:58:03,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=366840.0, ans=0.125 2023-12-22 02:58:38,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=367106.6666666667, ans=0.125 2023-12-22 02:58:38,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.0 2023-12-22 02:58:47,989 INFO [train.py:886] (3/4) Epoch 12, batch 2650, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4956180.78 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 02:58:50,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-12-22 02:59:10,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=367306.6666666667, ans=0.2 2023-12-22 02:59:22,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=367373.3333333333, ans=0.2 2023-12-22 02:59:32,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=367440.0, ans=0.0 2023-12-22 02:59:33,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=12.0 2023-12-22 02:59:36,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=367440.0, ans=0.125 2023-12-22 02:59:38,847 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.659e+01 2.800e+01 2.953e+01 3.305e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 02:59:39,831 INFO [train.py:886] (3/4) Epoch 12, batch 2700, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4952197.23 frames. ], batch size: 100, lr: 9.10e-03, grad_scale: 64.0 2023-12-22 02:59:41,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=367506.6666666667, ans=0.09899494936611666 2023-12-22 03:00:03,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367640.0, ans=0.0 2023-12-22 03:00:03,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=367640.0, ans=0.0 2023-12-22 03:00:05,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=12.0 2023-12-22 03:00:08,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=367640.0, ans=0.2 2023-12-22 03:00:17,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-12-22 03:00:30,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=367840.0, ans=0.125 2023-12-22 03:00:31,439 INFO [train.py:886] (3/4) Epoch 12, batch 2750, loss[loss=0.01361, audio_tagging_loss=0.01361, over 24910.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4955268.44 frames. ], batch size: 100, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:00:52,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=367973.3333333333, ans=12.0 2023-12-22 03:01:00,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=367973.3333333333, ans=0.2 2023-12-22 03:01:16,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=368106.6666666667, ans=0.2 2023-12-22 03:01:17,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=368106.6666666667, ans=0.125 2023-12-22 03:01:20,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=368106.6666666667, ans=0.125 2023-12-22 03:01:22,366 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.734e+01 2.863e+01 2.984e+01 3.983e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 03:01:23,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2023-12-22 03:01:23,371 INFO [train.py:886] (3/4) Epoch 12, batch 2800, loss[loss=0.01697, audio_tagging_loss=0.01697, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4954128.37 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:01:24,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2023-12-22 03:01:34,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-12-22 03:01:39,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=368240.0, ans=0.2 2023-12-22 03:01:42,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=368240.0, ans=0.0 2023-12-22 03:01:44,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=368306.6666666667, ans=0.125 2023-12-22 03:02:06,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=368440.0, ans=0.125 2023-12-22 03:02:11,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-12-22 03:02:16,467 INFO [train.py:886] (3/4) Epoch 12, batch 2850, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4947760.09 frames. ], batch size: 99, lr: 9.09e-03, grad_scale: 64.0 2023-12-22 03:02:30,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=12.0 2023-12-22 03:02:41,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=368640.0, ans=0.125 2023-12-22 03:02:45,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=368640.0, ans=0.2 2023-12-22 03:02:47,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=368706.6666666667, ans=15.0 2023-12-22 03:02:52,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368706.6666666667, ans=0.125 2023-12-22 03:02:54,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=368706.6666666667, ans=0.0 2023-12-22 03:03:06,049 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.672e+01 2.791e+01 2.941e+01 3.403e+01, threshold=5.583e+01, percent-clipped=0.0 2023-12-22 03:03:07,732 INFO [train.py:886] (3/4) Epoch 12, batch 2900, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4942295.31 frames. ], batch size: 100, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:03:20,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=368906.6666666667, ans=0.125 2023-12-22 03:03:59,024 INFO [train.py:886] (3/4) Epoch 12, batch 2950, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4940332.41 frames. ], batch size: 100, lr: 9.08e-03, grad_scale: 64.0 2023-12-22 03:03:59,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=369173.3333333333, ans=0.125 2023-12-22 03:04:01,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=369173.3333333333, ans=0.0 2023-12-22 03:04:11,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2023-12-22 03:04:19,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=369306.6666666667, ans=0.1 2023-12-22 03:04:23,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-12-22 03:04:50,025 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.641e+01 2.799e+01 3.004e+01 3.517e+01, threshold=5.597e+01, percent-clipped=0.0 2023-12-22 03:04:51,015 INFO [train.py:886] (3/4) Epoch 12, batch 3000, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4943965.29 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:04:51,015 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 03:05:12,275 INFO [train.py:917] (3/4) Epoch 12, validation: loss=0.03429, audio_tagging_loss=0.03429, over 3737520.00 frames. 2023-12-22 03:05:12,276 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 03:05:24,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369573.3333333333, ans=0.1 2023-12-22 03:05:34,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=369640.0, ans=0.07 2023-12-22 03:05:39,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-12-22 03:05:42,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=369706.6666666667, ans=0.0 2023-12-22 03:05:57,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=369773.3333333333, ans=0.125 2023-12-22 03:06:03,272 INFO [train.py:886] (3/4) Epoch 12, batch 3050, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4949610.75 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:06:08,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=369840.0, ans=0.125 2023-12-22 03:06:26,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-12-22 03:06:53,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=370106.6666666667, ans=0.0 2023-12-22 03:06:54,447 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.661e+01 2.811e+01 2.974e+01 3.661e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:06:55,405 INFO [train.py:886] (3/4) Epoch 12, batch 3100, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4951770.76 frames. ], batch size: 100, lr: 9.07e-03, grad_scale: 64.0 2023-12-22 03:07:01,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-12-22 03:07:23,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=370306.6666666667, ans=0.125 2023-12-22 03:07:29,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-12-22 03:07:45,212 INFO [train.py:886] (3/4) Epoch 12, batch 3150, loss[loss=0.01668, audio_tagging_loss=0.01668, over 24750.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4943777.51 frames. ], batch size: 99, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:07:50,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=370506.6666666667, ans=0.125 2023-12-22 03:07:50,807 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.022e-02 2023-12-22 03:07:59,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=370573.3333333333, ans=0.125 2023-12-22 03:08:37,464 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.651e+01 2.845e+01 3.043e+01 3.695e+01, threshold=5.689e+01, percent-clipped=0.0 2023-12-22 03:08:38,465 INFO [train.py:886] (3/4) Epoch 12, batch 3200, loss[loss=0.01598, audio_tagging_loss=0.01598, over 24750.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4943939.92 frames. ], batch size: 99, lr: 9.06e-03, grad_scale: 64.0 2023-12-22 03:08:41,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=370840.0, ans=0.0 2023-12-22 03:09:01,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370973.3333333333, ans=0.1 2023-12-22 03:09:11,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=371040.0, ans=0.1 2023-12-22 03:09:29,961 INFO [train.py:886] (3/4) Epoch 12, batch 3250, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4947650.05 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:09:40,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=371240.0, ans=0.05 2023-12-22 03:10:08,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=371373.3333333333, ans=0.0 2023-12-22 03:10:12,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=371440.0, ans=0.09899494936611666 2023-12-22 03:10:13,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=371440.0, ans=0.125 2023-12-22 03:10:18,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371440.0, ans=0.125 2023-12-22 03:10:20,354 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.685e+01 2.819e+01 2.963e+01 3.522e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 03:10:21,390 INFO [train.py:886] (3/4) Epoch 12, batch 3300, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4943576.88 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:10:32,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=371573.3333333333, ans=0.0 2023-12-22 03:10:53,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=371706.6666666667, ans=0.07 2023-12-22 03:11:00,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=371706.6666666667, ans=0.125 2023-12-22 03:11:00,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-12-22 03:11:06,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=12.0 2023-12-22 03:11:13,993 INFO [train.py:886] (3/4) Epoch 12, batch 3350, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4950554.23 frames. ], batch size: 100, lr: 9.05e-03, grad_scale: 64.0 2023-12-22 03:11:23,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=371906.6666666667, ans=0.125 2023-12-22 03:11:26,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371906.6666666667, ans=0.125 2023-12-22 03:11:32,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=371906.6666666667, ans=0.125 2023-12-22 03:11:33,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=371973.3333333333, ans=0.0 2023-12-22 03:11:53,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=372040.0, ans=0.125 2023-12-22 03:11:56,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-12-22 03:12:04,924 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.659e+01 2.803e+01 3.006e+01 4.806e+01, threshold=5.606e+01, percent-clipped=0.0 2023-12-22 03:12:05,919 INFO [train.py:886] (3/4) Epoch 12, batch 3400, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4953342.16 frames. ], batch size: 100, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:12:08,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=372173.3333333333, ans=0.125 2023-12-22 03:12:10,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372173.3333333333, ans=0.1 2023-12-22 03:12:12,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372173.3333333333, ans=0.1 2023-12-22 03:12:16,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372240.0, ans=0.1 2023-12-22 03:12:20,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=372240.0, ans=0.125 2023-12-22 03:12:33,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2023-12-22 03:12:37,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=372373.3333333333, ans=0.0 2023-12-22 03:12:42,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=372373.3333333333, ans=0.125 2023-12-22 03:12:52,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372440.0, ans=0.1 2023-12-22 03:12:52,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-22 03:12:58,605 INFO [train.py:886] (3/4) Epoch 12, batch 3450, loss[loss=0.01808, audio_tagging_loss=0.01808, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4943070.68 frames. ], batch size: 99, lr: 9.04e-03, grad_scale: 64.0 2023-12-22 03:13:15,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=372573.3333333333, ans=0.0 2023-12-22 03:13:22,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372640.0, ans=0.1 2023-12-22 03:13:49,407 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.768e+01 2.902e+01 3.061e+01 3.695e+01, threshold=5.804e+01, percent-clipped=0.0 2023-12-22 03:13:51,127 INFO [train.py:886] (3/4) Epoch 12, batch 3500, loss[loss=0.01659, audio_tagging_loss=0.01659, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4940721.40 frames. ], batch size: 99, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:13:57,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=372840.0, ans=0.125 2023-12-22 03:14:05,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-12-22 03:14:12,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=28.76 vs. limit=22.5 2023-12-22 03:14:17,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=372973.3333333333, ans=0.125 2023-12-22 03:14:19,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-12-22 03:14:23,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373040.0, ans=0.125 2023-12-22 03:14:25,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=373040.0, ans=0.125 2023-12-22 03:14:34,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=373106.6666666667, ans=0.125 2023-12-22 03:14:41,943 INFO [train.py:886] (3/4) Epoch 12, batch 3550, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4936950.31 frames. ], batch size: 99, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:14:43,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=373173.3333333333, ans=0.95 2023-12-22 03:15:06,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=12.0 2023-12-22 03:15:10,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=373306.6666666667, ans=0.125 2023-12-22 03:15:35,217 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.600e+01 2.744e+01 2.864e+01 3.557e+01, threshold=5.489e+01, percent-clipped=0.0 2023-12-22 03:15:36,194 INFO [train.py:886] (3/4) Epoch 12, batch 3600, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01478, audio_tagging_loss=0.01478, over 4943866.83 frames. ], batch size: 100, lr: 9.03e-03, grad_scale: 64.0 2023-12-22 03:15:52,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=373573.3333333333, ans=0.0 2023-12-22 03:16:03,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=373640.0, ans=0.2 2023-12-22 03:16:14,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2023-12-22 03:16:15,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2023-12-22 03:16:27,960 INFO [train.py:886] (3/4) Epoch 12, batch 3650, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4953150.63 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:16:41,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-22 03:16:43,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373906.6666666667, ans=0.125 2023-12-22 03:16:52,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=373973.3333333333, ans=0.125 2023-12-22 03:17:04,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2023-12-22 03:17:13,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374106.6666666667, ans=0.1 2023-12-22 03:17:18,606 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.722e+01 2.831e+01 2.983e+01 3.559e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:17:19,594 INFO [train.py:886] (3/4) Epoch 12, batch 3700, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4957322.55 frames. ], batch size: 100, lr: 9.02e-03, grad_scale: 64.0 2023-12-22 03:17:22,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=374173.3333333333, ans=0.125 2023-12-22 03:17:47,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-22 03:18:06,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=374440.0, ans=0.125 2023-12-22 03:18:12,525 INFO [train.py:886] (3/4) Epoch 12, batch 3750, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4949138.57 frames. ], batch size: 99, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:18:20,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=374506.6666666667, ans=0.125 2023-12-22 03:18:24,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374573.3333333333, ans=0.1 2023-12-22 03:18:24,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-22 03:18:39,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=374640.0, ans=0.0 2023-12-22 03:18:47,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=374706.6666666667, ans=0.0 2023-12-22 03:18:53,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=374773.3333333333, ans=0.125 2023-12-22 03:18:59,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=374773.3333333333, ans=0.1 2023-12-22 03:19:01,947 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:19:02,674 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.712e+01 2.843e+01 3.020e+01 3.497e+01, threshold=5.685e+01, percent-clipped=0.0 2023-12-22 03:19:03,674 INFO [train.py:886] (3/4) Epoch 12, batch 3800, loss[loss=0.01167, audio_tagging_loss=0.01167, over 22373.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4943206.48 frames. ], batch size: 107, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:19:04,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-12-22 03:19:06,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=374840.0, ans=10.0 2023-12-22 03:19:07,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374840.0, ans=0.1 2023-12-22 03:19:13,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=374906.6666666667, ans=0.125 2023-12-22 03:19:13,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374906.6666666667, ans=0.125 2023-12-22 03:19:15,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374906.6666666667, ans=0.1 2023-12-22 03:19:20,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=374906.6666666667, ans=0.0 2023-12-22 03:19:30,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=374973.3333333333, ans=0.0 2023-12-22 03:19:34,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=375040.0, ans=0.125 2023-12-22 03:19:49,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=375106.6666666667, ans=0.2 2023-12-22 03:19:51,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.26 vs. limit=22.5 2023-12-22 03:19:52,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-12-22 03:19:55,528 INFO [train.py:886] (3/4) Epoch 12, batch 3850, loss[loss=0.01634, audio_tagging_loss=0.01634, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4940850.88 frames. ], batch size: 100, lr: 9.01e-03, grad_scale: 64.0 2023-12-22 03:20:05,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=375240.0, ans=0.0 2023-12-22 03:20:13,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375240.0, ans=0.125 2023-12-22 03:20:15,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=375306.6666666667, ans=0.0 2023-12-22 03:20:46,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+01 2.664e+01 2.840e+01 2.981e+01 3.799e+01, threshold=5.681e+01, percent-clipped=0.0 2023-12-22 03:20:47,897 INFO [train.py:886] (3/4) Epoch 12, batch 3900, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4944663.32 frames. ], batch size: 100, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:20:48,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375506.6666666667, ans=0.1 2023-12-22 03:20:52,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-22 03:21:02,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=375573.3333333333, ans=0.2 2023-12-22 03:21:03,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375573.3333333333, ans=0.125 2023-12-22 03:21:05,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=375573.3333333333, ans=0.0 2023-12-22 03:21:32,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=375773.3333333333, ans=0.125 2023-12-22 03:21:34,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=375773.3333333333, ans=0.2 2023-12-22 03:21:39,166 INFO [train.py:886] (3/4) Epoch 12, batch 3950, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4947990.00 frames. ], batch size: 100, lr: 9.00e-03, grad_scale: 64.0 2023-12-22 03:21:58,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375906.6666666667, ans=0.125 2023-12-22 03:22:09,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2023-12-22 03:22:11,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=376040.0, ans=0.125 2023-12-22 03:22:14,835 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:22:24,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=376106.6666666667, ans=0.0 2023-12-22 03:22:30,718 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.669e+01 2.811e+01 2.989e+01 3.723e+01, threshold=5.621e+01, percent-clipped=0.0 2023-12-22 03:22:31,693 INFO [train.py:886] (3/4) Epoch 12, batch 4000, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24034.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4951740.98 frames. ], batch size: 100, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:22:35,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=376173.3333333333, ans=0.0 2023-12-22 03:22:45,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376240.0, ans=0.125 2023-12-22 03:22:48,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=376240.0, ans=0.0 2023-12-22 03:22:50,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=376306.6666666667, ans=0.09899494936611666 2023-12-22 03:22:55,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=376306.6666666667, ans=0.0 2023-12-22 03:22:58,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-22 03:23:04,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=376373.3333333333, ans=0.0 2023-12-22 03:23:04,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=376373.3333333333, ans=0.0 2023-12-22 03:23:05,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=376373.3333333333, ans=0.09899494936611666 2023-12-22 03:23:22,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=376506.6666666667, ans=0.0 2023-12-22 03:23:22,936 INFO [train.py:886] (3/4) Epoch 12, batch 4050, loss[loss=0.0167, audio_tagging_loss=0.0167, over 24953.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4952914.08 frames. ], batch size: 100, lr: 8.99e-03, grad_scale: 128.0 2023-12-22 03:23:49,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=376640.0, ans=0.125 2023-12-22 03:23:52,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376706.6666666667, ans=0.125 2023-12-22 03:24:04,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=376773.3333333333, ans=0.125 2023-12-22 03:24:14,134 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.760e+01 2.894e+01 3.048e+01 3.546e+01, threshold=5.789e+01, percent-clipped=0.0 2023-12-22 03:24:14,158 INFO [train.py:886] (3/4) Epoch 12, batch 4100, loss[loss=0.01574, audio_tagging_loss=0.01574, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4945244.48 frames. ], batch size: 99, lr: 8.99e-03, grad_scale: 64.0 2023-12-22 03:24:21,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=376840.0, ans=0.125 2023-12-22 03:24:22,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-12-22 03:24:37,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=376973.3333333333, ans=0.0 2023-12-22 03:24:52,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=377040.0, ans=0.0 2023-12-22 03:25:07,062 INFO [train.py:886] (3/4) Epoch 12, batch 4150, loss[loss=0.01725, audio_tagging_loss=0.01725, over 24750.00 frames. ], tot_loss[loss=0.01487, audio_tagging_loss=0.01487, over 4948210.29 frames. ], batch size: 99, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:25:15,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=377240.0, ans=0.0 2023-12-22 03:25:16,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=377240.0, ans=0.125 2023-12-22 03:25:22,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2023-12-22 03:25:26,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-22 03:25:31,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=377306.6666666667, ans=0.125 2023-12-22 03:25:58,385 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.599e+01 2.785e+01 2.978e+01 3.505e+01, threshold=5.570e+01, percent-clipped=0.0 2023-12-22 03:25:58,421 INFO [train.py:886] (3/4) Epoch 12, batch 4200, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4944596.30 frames. ], batch size: 100, lr: 8.98e-03, grad_scale: 64.0 2023-12-22 03:26:00,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-12-22 03:26:13,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=377573.3333333333, ans=0.0 2023-12-22 03:26:33,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-12-22 03:26:34,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.605e-03 2023-12-22 03:26:34,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=12.0 2023-12-22 03:26:42,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377773.3333333333, ans=0.1 2023-12-22 03:26:50,236 INFO [train.py:886] (3/4) Epoch 12, batch 4250, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4944169.08 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:26:56,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-22 03:26:57,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=377840.0, ans=0.125 2023-12-22 03:26:57,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377840.0, ans=0.1 2023-12-22 03:27:22,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=378040.0, ans=0.0 2023-12-22 03:27:41,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+01 2.668e+01 2.819e+01 2.959e+01 3.495e+01, threshold=5.638e+01, percent-clipped=0.0 2023-12-22 03:27:41,815 INFO [train.py:886] (3/4) Epoch 12, batch 4300, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24059.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4948450.14 frames. ], batch size: 100, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:27:44,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=378173.3333333333, ans=15.0 2023-12-22 03:27:53,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=15.0 2023-12-22 03:27:55,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=378240.0, ans=0.125 2023-12-22 03:27:59,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=378240.0, ans=0.0 2023-12-22 03:27:59,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-12-22 03:28:11,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2023-12-22 03:28:20,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2023-12-22 03:28:22,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-22 03:28:25,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2023-12-22 03:28:32,931 INFO [train.py:886] (3/4) Epoch 12, batch 4350, loss[loss=0.01691, audio_tagging_loss=0.01691, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4954951.85 frames. ], batch size: 99, lr: 8.97e-03, grad_scale: 64.0 2023-12-22 03:28:34,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=378506.6666666667, ans=0.2 2023-12-22 03:28:46,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378573.3333333333, ans=0.1 2023-12-22 03:28:47,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=378573.3333333333, ans=15.0 2023-12-22 03:28:49,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2023-12-22 03:28:50,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=378573.3333333333, ans=0.0 2023-12-22 03:28:53,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2023-12-22 03:29:15,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-12-22 03:29:15,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=378773.3333333333, ans=0.09899494936611666 2023-12-22 03:29:25,511 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.769e+01 2.901e+01 3.056e+01 3.737e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 03:29:25,535 INFO [train.py:886] (3/4) Epoch 12, batch 4400, loss[loss=0.01535, audio_tagging_loss=0.01535, over 24946.00 frames. ], tot_loss[loss=0.01489, audio_tagging_loss=0.01489, over 4954866.86 frames. ], batch size: 100, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:29:32,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2023-12-22 03:29:37,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=378906.6666666667, ans=0.0 2023-12-22 03:30:14,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=379106.6666666667, ans=0.2 2023-12-22 03:30:17,495 INFO [train.py:886] (3/4) Epoch 12, batch 4450, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01488, audio_tagging_loss=0.01488, over 4955024.12 frames. ], batch size: 99, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:30:35,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-22 03:31:00,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=379440.0, ans=0.125 2023-12-22 03:31:02,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-12-22 03:31:09,035 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.635e+01 2.785e+01 2.931e+01 3.625e+01, threshold=5.571e+01, percent-clipped=0.0 2023-12-22 03:31:09,059 INFO [train.py:886] (3/4) Epoch 12, batch 4500, loss[loss=0.0151, audio_tagging_loss=0.0151, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4955886.56 frames. ], batch size: 100, lr: 8.96e-03, grad_scale: 64.0 2023-12-22 03:31:27,463 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.985e-01 2023-12-22 03:31:39,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=379706.6666666667, ans=0.125 2023-12-22 03:31:43,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=379706.6666666667, ans=0.125 2023-12-22 03:31:59,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=379840.0, ans=0.125 2023-12-22 03:32:00,479 INFO [train.py:886] (3/4) Epoch 12, batch 4550, loss[loss=0.01402, audio_tagging_loss=0.01402, over 21768.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4958004.55 frames. ], batch size: 107, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:02,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=379840.0, ans=0.125 2023-12-22 03:32:06,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2023-12-22 03:32:37,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=12.0 2023-12-22 03:32:40,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-12-22 03:32:51,249 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.640e+01 2.744e+01 2.893e+01 3.340e+01, threshold=5.488e+01, percent-clipped=0.0 2023-12-22 03:32:51,273 INFO [train.py:886] (3/4) Epoch 12, batch 4600, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4958342.18 frames. ], batch size: 99, lr: 8.95e-03, grad_scale: 64.0 2023-12-22 03:32:52,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-22 03:32:55,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=380173.3333333333, ans=0.125 2023-12-22 03:33:01,854 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:33:08,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=380240.0, ans=0.125 2023-12-22 03:33:40,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=380440.0, ans=0.0 2023-12-22 03:33:42,646 INFO [train.py:886] (3/4) Epoch 12, batch 4650, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4959077.35 frames. ], batch size: 100, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:34:31,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=380840.0, ans=0.125 2023-12-22 03:34:32,595 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+01 2.686e+01 2.822e+01 3.003e+01 3.524e+01, threshold=5.643e+01, percent-clipped=0.0 2023-12-22 03:34:32,619 INFO [train.py:886] (3/4) Epoch 12, batch 4700, loss[loss=0.01652, audio_tagging_loss=0.01652, over 24750.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4953676.75 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:34:35,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=380840.0, ans=0.0 2023-12-22 03:34:35,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2023-12-22 03:34:43,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380906.6666666667, ans=0.1 2023-12-22 03:34:44,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-22 03:34:45,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=380906.6666666667, ans=0.04949747468305833 2023-12-22 03:34:48,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=380906.6666666667, ans=0.125 2023-12-22 03:34:49,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380906.6666666667, ans=0.1 2023-12-22 03:34:54,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=380973.3333333333, ans=0.2 2023-12-22 03:35:01,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-22 03:35:02,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-12-22 03:35:20,076 INFO [train.py:886] (3/4) Epoch 12, batch 4750, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24750.00 frames. ], tot_loss[loss=0.01492, audio_tagging_loss=0.01492, over 4949501.36 frames. ], batch size: 99, lr: 8.94e-03, grad_scale: 64.0 2023-12-22 03:35:57,810 INFO [train.py:886] (3/4) Epoch 13, batch 0, loss[loss=0.02889, audio_tagging_loss=0.02889, over 25000.00 frames. ], tot_loss[loss=0.02889, audio_tagging_loss=0.02889, over 25000.00 frames. ], batch size: 100, lr: 8.59e-03, grad_scale: 32.0 2023-12-22 03:35:57,810 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 03:36:18,410 INFO [train.py:917] (3/4) Epoch 13, validation: loss=0.03383, audio_tagging_loss=0.03383, over 3737520.00 frames. 2023-12-22 03:36:18,411 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 03:36:18,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=381280.0, ans=0.05 2023-12-22 03:36:21,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=381280.0, ans=0.125 2023-12-22 03:36:24,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.72 vs. limit=22.5 2023-12-22 03:36:54,912 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.829e+01 3.058e+01 3.866e+01 8.495e+01, threshold=6.115e+01, percent-clipped=6.0 2023-12-22 03:37:10,115 INFO [train.py:886] (3/4) Epoch 13, batch 50, loss[loss=0.01738, audio_tagging_loss=0.01738, over 23937.00 frames. ], tot_loss[loss=0.02314, audio_tagging_loss=0.02314, over 1122031.11 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:37:11,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=381613.3333333333, ans=0.0 2023-12-22 03:37:27,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=381680.0, ans=0.125 2023-12-22 03:37:29,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=381680.0, ans=0.125 2023-12-22 03:37:58,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381880.0, ans=0.1 2023-12-22 03:38:02,621 INFO [train.py:886] (3/4) Epoch 13, batch 100, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01983, audio_tagging_loss=0.01983, over 1976025.57 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:38:21,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=382013.3333333333, ans=0.0 2023-12-22 03:38:24,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=382080.0, ans=0.125 2023-12-22 03:38:36,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=382146.6666666667, ans=0.0 2023-12-22 03:38:39,108 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.907e+01 3.118e+01 3.285e+01 3.851e+01, threshold=6.236e+01, percent-clipped=0.0 2023-12-22 03:38:42,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382146.6666666667, ans=0.1 2023-12-22 03:38:43,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=382213.3333333333, ans=0.09899494936611666 2023-12-22 03:38:49,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382213.3333333333, ans=0.125 2023-12-22 03:38:51,374 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:38:54,830 INFO [train.py:886] (3/4) Epoch 13, batch 150, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24004.00 frames. ], tot_loss[loss=0.01813, audio_tagging_loss=0.01813, over 2632131.08 frames. ], batch size: 100, lr: 8.58e-03, grad_scale: 32.0 2023-12-22 03:38:58,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=382280.0, ans=0.0 2023-12-22 03:39:04,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=382346.6666666667, ans=0.125 2023-12-22 03:39:26,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=382480.0, ans=0.125 2023-12-22 03:39:28,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=382480.0, ans=0.125 2023-12-22 03:39:31,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2023-12-22 03:39:33,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382480.0, ans=0.125 2023-12-22 03:39:46,373 INFO [train.py:886] (3/4) Epoch 13, batch 200, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 3146964.73 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:40:02,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382680.0, ans=0.1 2023-12-22 03:40:06,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=382746.6666666667, ans=0.025 2023-12-22 03:40:13,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=382746.6666666667, ans=0.125 2023-12-22 03:40:14,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=382746.6666666667, ans=0.5 2023-12-22 03:40:22,761 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.728e+01 2.862e+01 2.980e+01 3.546e+01, threshold=5.723e+01, percent-clipped=0.0 2023-12-22 03:40:38,316 INFO [train.py:886] (3/4) Epoch 13, batch 250, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01642, audio_tagging_loss=0.01642, over 3549158.15 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 32.0 2023-12-22 03:40:55,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=383013.3333333333, ans=0.125 2023-12-22 03:40:56,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-12-22 03:41:03,271 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:41:03,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383080.0, ans=0.1 2023-12-22 03:41:16,322 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.023e-01 2023-12-22 03:41:20,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383213.3333333333, ans=0.1 2023-12-22 03:41:21,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=383213.3333333333, ans=0.125 2023-12-22 03:41:23,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=383213.3333333333, ans=0.125 2023-12-22 03:41:29,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.69 vs. limit=15.0 2023-12-22 03:41:29,764 INFO [train.py:886] (3/4) Epoch 13, batch 300, loss[loss=0.01756, audio_tagging_loss=0.01756, over 24750.00 frames. ], tot_loss[loss=0.01609, audio_tagging_loss=0.01609, over 3857142.66 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:41:44,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=383346.6666666667, ans=0.2 2023-12-22 03:42:06,351 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 2.665e+01 2.854e+01 3.043e+01 3.614e+01, threshold=5.708e+01, percent-clipped=0.0 2023-12-22 03:42:22,003 INFO [train.py:886] (3/4) Epoch 13, batch 350, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.01573, audio_tagging_loss=0.01573, over 4098646.56 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:42:30,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=383613.3333333333, ans=0.125 2023-12-22 03:43:07,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383880.0, ans=0.1 2023-12-22 03:43:14,605 INFO [train.py:886] (3/4) Epoch 13, batch 400, loss[loss=0.01514, audio_tagging_loss=0.01514, over 24750.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 4287471.38 frames. ], batch size: 99, lr: 8.56e-03, grad_scale: 32.0 2023-12-22 03:43:31,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=384013.3333333333, ans=0.0 2023-12-22 03:43:36,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=384080.0, ans=0.125 2023-12-22 03:43:51,102 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.676e+01 2.770e+01 2.913e+01 3.430e+01, threshold=5.541e+01, percent-clipped=0.0 2023-12-22 03:44:01,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=384213.3333333333, ans=10.0 2023-12-22 03:44:02,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=384213.3333333333, ans=0.125 2023-12-22 03:44:02,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=384213.3333333333, ans=0.125 2023-12-22 03:44:05,951 INFO [train.py:886] (3/4) Epoch 13, batch 450, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01517, audio_tagging_loss=0.01517, over 4437168.76 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:44:06,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=384280.0, ans=0.125 2023-12-22 03:44:17,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=384346.6666666667, ans=0.0 2023-12-22 03:44:36,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=384480.0, ans=0.125 2023-12-22 03:44:48,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=384546.6666666667, ans=0.1 2023-12-22 03:44:52,764 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.119e-02 2023-12-22 03:44:58,220 INFO [train.py:886] (3/4) Epoch 13, batch 500, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01499, audio_tagging_loss=0.01499, over 4555325.26 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:45:22,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=384746.6666666667, ans=0.125 2023-12-22 03:45:23,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=384746.6666666667, ans=0.0 2023-12-22 03:45:31,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=384813.3333333333, ans=0.125 2023-12-22 03:45:34,429 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.645e+01 2.803e+01 2.992e+01 3.772e+01, threshold=5.607e+01, percent-clipped=0.0 2023-12-22 03:45:40,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=384880.0, ans=0.0 2023-12-22 03:45:40,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=384880.0, ans=0.2 2023-12-22 03:45:42,915 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.553e-02 2023-12-22 03:45:45,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=384880.0, ans=0.0 2023-12-22 03:45:50,086 INFO [train.py:886] (3/4) Epoch 13, batch 550, loss[loss=0.01366, audio_tagging_loss=0.01366, over 23907.00 frames. ], tot_loss[loss=0.01493, audio_tagging_loss=0.01493, over 4641862.09 frames. ], batch size: 100, lr: 8.55e-03, grad_scale: 32.0 2023-12-22 03:45:55,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=384946.6666666667, ans=10.0 2023-12-22 03:45:55,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=384946.6666666667, ans=0.125 2023-12-22 03:46:00,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=385013.3333333333, ans=0.0 2023-12-22 03:46:11,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=385080.0, ans=0.0 2023-12-22 03:46:41,738 INFO [train.py:886] (3/4) Epoch 13, batch 600, loss[loss=0.01675, audio_tagging_loss=0.01675, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4714810.73 frames. ], batch size: 100, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:46:44,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385280.0, ans=0.0 2023-12-22 03:46:44,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=385280.0, ans=0.2 2023-12-22 03:47:03,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=385413.3333333333, ans=0.035 2023-12-22 03:47:12,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-12-22 03:47:14,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=385480.0, ans=0.125 2023-12-22 03:47:17,928 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.658e+01 2.809e+01 2.976e+01 3.464e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 03:47:21,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=385546.6666666667, ans=0.5 2023-12-22 03:47:30,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=385546.6666666667, ans=0.125 2023-12-22 03:47:33,623 INFO [train.py:886] (3/4) Epoch 13, batch 650, loss[loss=0.0136, audio_tagging_loss=0.0136, over 24750.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 4756855.39 frames. ], batch size: 99, lr: 8.54e-03, grad_scale: 32.0 2023-12-22 03:47:34,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=385613.3333333333, ans=0.125 2023-12-22 03:47:40,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=385613.3333333333, ans=0.0 2023-12-22 03:48:05,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.12 vs. limit=10.0 2023-12-22 03:48:09,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=12.0 2023-12-22 03:48:15,623 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:48:20,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=385880.0, ans=0.0 2023-12-22 03:48:24,009 INFO [train.py:886] (3/4) Epoch 13, batch 700, loss[loss=0.01544, audio_tagging_loss=0.01544, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4801540.59 frames. ], batch size: 99, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:48:25,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=385946.6666666667, ans=0.125 2023-12-22 03:48:40,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=386013.3333333333, ans=0.125 2023-12-22 03:48:43,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=386013.3333333333, ans=0.04949747468305833 2023-12-22 03:48:48,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386080.0, ans=0.1 2023-12-22 03:49:01,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.667e+01 2.765e+01 2.953e+01 3.565e+01, threshold=5.530e+01, percent-clipped=0.0 2023-12-22 03:49:01,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=386146.6666666667, ans=0.125 2023-12-22 03:49:12,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=386213.3333333333, ans=0.95 2023-12-22 03:49:13,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2023-12-22 03:49:15,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=386213.3333333333, ans=0.0 2023-12-22 03:49:17,655 INFO [train.py:886] (3/4) Epoch 13, batch 750, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4826993.98 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:50:06,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.39 vs. limit=22.5 2023-12-22 03:50:08,444 INFO [train.py:886] (3/4) Epoch 13, batch 800, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24910.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4858924.48 frames. ], batch size: 100, lr: 8.53e-03, grad_scale: 32.0 2023-12-22 03:50:11,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=386613.3333333333, ans=0.0 2023-12-22 03:50:19,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=386680.0, ans=0.04949747468305833 2023-12-22 03:50:20,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=386680.0, ans=0.2 2023-12-22 03:50:23,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386680.0, ans=0.1 2023-12-22 03:50:25,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=386680.0, ans=15.0 2023-12-22 03:50:30,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-12-22 03:50:37,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=386746.6666666667, ans=0.0 2023-12-22 03:50:39,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=386813.3333333333, ans=0.125 2023-12-22 03:50:42,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=386813.3333333333, ans=0.0 2023-12-22 03:50:45,494 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.686e+01 2.811e+01 2.962e+01 3.601e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:51:01,118 INFO [train.py:886] (3/4) Epoch 13, batch 850, loss[loss=0.01576, audio_tagging_loss=0.01576, over 25000.00 frames. ], tot_loss[loss=0.0148, audio_tagging_loss=0.0148, over 4884604.72 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:51:01,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=386946.6666666667, ans=0.035 2023-12-22 03:51:05,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=386946.6666666667, ans=0.125 2023-12-22 03:51:14,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387013.3333333333, ans=0.0 2023-12-22 03:51:25,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=12.0 2023-12-22 03:51:35,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=387146.6666666667, ans=0.125 2023-12-22 03:51:40,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=387146.6666666667, ans=0.0 2023-12-22 03:51:45,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-22 03:51:52,652 INFO [train.py:886] (3/4) Epoch 13, batch 900, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4903874.77 frames. ], batch size: 99, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:52:02,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387346.6666666667, ans=0.125 2023-12-22 03:52:04,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=387346.6666666667, ans=0.09899494936611666 2023-12-22 03:52:17,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=387413.3333333333, ans=0.0 2023-12-22 03:52:19,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 03:52:29,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.739e+01 2.831e+01 3.018e+01 3.521e+01, threshold=5.662e+01, percent-clipped=0.0 2023-12-22 03:52:44,290 INFO [train.py:886] (3/4) Epoch 13, batch 950, loss[loss=0.01744, audio_tagging_loss=0.01744, over 24750.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4907406.81 frames. ], batch size: 99, lr: 8.52e-03, grad_scale: 32.0 2023-12-22 03:52:44,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2023-12-22 03:52:52,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=387613.3333333333, ans=0.0 2023-12-22 03:53:01,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-12-22 03:53:07,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=387746.6666666667, ans=0.125 2023-12-22 03:53:22,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=387813.3333333333, ans=0.125 2023-12-22 03:53:25,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=387880.0, ans=0.0 2023-12-22 03:53:36,360 INFO [train.py:886] (3/4) Epoch 13, batch 1000, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24750.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 4910922.58 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:53:36,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=387946.6666666667, ans=0.0 2023-12-22 03:53:38,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=387946.6666666667, ans=0.125 2023-12-22 03:53:42,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=387946.6666666667, ans=0.2 2023-12-22 03:54:01,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=388080.0, ans=0.2 2023-12-22 03:54:04,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=388080.0, ans=0.0 2023-12-22 03:54:12,661 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+01 2.670e+01 2.865e+01 3.020e+01 3.813e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 03:54:19,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=388213.3333333333, ans=0.125 2023-12-22 03:54:28,325 INFO [train.py:886] (3/4) Epoch 13, batch 1050, loss[loss=0.01862, audio_tagging_loss=0.01862, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 4921478.21 frames. ], batch size: 100, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:54:36,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=388280.0, ans=0.09899494936611666 2023-12-22 03:54:46,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-12-22 03:55:20,170 INFO [train.py:886] (3/4) Epoch 13, batch 1100, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4927232.29 frames. ], batch size: 99, lr: 8.51e-03, grad_scale: 32.0 2023-12-22 03:55:20,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-12-22 03:55:51,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=388813.3333333333, ans=0.0 2023-12-22 03:55:56,136 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 2.628e+01 2.785e+01 2.893e+01 3.439e+01, threshold=5.569e+01, percent-clipped=0.0 2023-12-22 03:55:57,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-12-22 03:56:11,832 INFO [train.py:886] (3/4) Epoch 13, batch 1150, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4935190.38 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:56:15,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=388946.6666666667, ans=0.07 2023-12-22 03:56:37,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-12-22 03:56:38,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=389080.0, ans=10.0 2023-12-22 03:56:44,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=389146.6666666667, ans=0.2 2023-12-22 03:56:47,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=389146.6666666667, ans=0.125 2023-12-22 03:57:02,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=389280.0, ans=0.04949747468305833 2023-12-22 03:57:04,152 INFO [train.py:886] (3/4) Epoch 13, batch 1200, loss[loss=0.01708, audio_tagging_loss=0.01708, over 25000.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4943363.29 frames. ], batch size: 100, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:57:10,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-12-22 03:57:12,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=12.0 2023-12-22 03:57:18,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=389346.6666666667, ans=0.07 2023-12-22 03:57:20,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=389346.6666666667, ans=0.125 2023-12-22 03:57:29,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=389413.3333333333, ans=0.125 2023-12-22 03:57:31,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=389413.3333333333, ans=0.2 2023-12-22 03:57:41,004 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.433e+01 2.655e+01 2.812e+01 2.956e+01 4.274e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 03:57:42,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=389480.0, ans=0.04949747468305833 2023-12-22 03:57:44,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=389480.0, ans=0.05 2023-12-22 03:57:55,943 INFO [train.py:886] (3/4) Epoch 13, batch 1250, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4937411.01 frames. ], batch size: 99, lr: 8.50e-03, grad_scale: 32.0 2023-12-22 03:58:01,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-12-22 03:58:19,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=389746.6666666667, ans=0.0 2023-12-22 03:58:28,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389813.3333333333, ans=0.1 2023-12-22 03:58:33,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=389813.3333333333, ans=0.2 2023-12-22 03:58:34,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=389813.3333333333, ans=0.0 2023-12-22 03:58:48,351 INFO [train.py:886] (3/4) Epoch 13, batch 1300, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 4937165.53 frames. ], batch size: 99, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:59:04,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.11 vs. limit=6.0 2023-12-22 03:59:14,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=390080.0, ans=0.0 2023-12-22 03:59:18,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=390146.6666666667, ans=0.2 2023-12-22 03:59:22,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-12-22 03:59:23,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=390146.6666666667, ans=0.0 2023-12-22 03:59:24,296 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.761e+01 2.854e+01 2.985e+01 3.406e+01, threshold=5.709e+01, percent-clipped=0.0 2023-12-22 03:59:39,918 INFO [train.py:886] (3/4) Epoch 13, batch 1350, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4940099.49 frames. ], batch size: 99, lr: 8.49e-03, grad_scale: 32.0 2023-12-22 03:59:43,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=390280.0, ans=0.125 2023-12-22 03:59:43,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=390280.0, ans=0.125 2023-12-22 03:59:47,193 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 03:59:57,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=390346.6666666667, ans=0.0 2023-12-22 04:00:04,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=390413.3333333333, ans=0.1 2023-12-22 04:00:11,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-12-22 04:00:26,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2023-12-22 04:00:27,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-22 04:00:29,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-12-22 04:00:32,542 INFO [train.py:886] (3/4) Epoch 13, batch 1400, loss[loss=0.01453, audio_tagging_loss=0.01453, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4945345.32 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:00:38,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=12.0 2023-12-22 04:00:41,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=390680.0, ans=0.0 2023-12-22 04:00:44,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=390680.0, ans=0.125 2023-12-22 04:00:44,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.08 vs. limit=15.0 2023-12-22 04:01:06,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=390813.3333333333, ans=0.0 2023-12-22 04:01:08,893 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.605e+01 2.750e+01 2.989e+01 3.413e+01, threshold=5.499e+01, percent-clipped=0.0 2023-12-22 04:01:16,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=390880.0, ans=10.0 2023-12-22 04:01:24,374 INFO [train.py:886] (3/4) Epoch 13, batch 1450, loss[loss=0.0139, audio_tagging_loss=0.0139, over 24896.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4952900.76 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:01:32,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2023-12-22 04:01:34,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=391013.3333333333, ans=0.125 2023-12-22 04:01:39,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=391013.3333333333, ans=0.125 2023-12-22 04:01:39,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-12-22 04:01:44,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391080.0, ans=0.1 2023-12-22 04:01:53,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=391080.0, ans=0.125 2023-12-22 04:02:05,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=391213.3333333333, ans=0.125 2023-12-22 04:02:06,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391213.3333333333, ans=0.1 2023-12-22 04:02:12,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=391213.3333333333, ans=0.125 2023-12-22 04:02:12,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-22 04:02:13,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391213.3333333333, ans=0.125 2023-12-22 04:02:15,786 INFO [train.py:886] (3/4) Epoch 13, batch 1500, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4955475.57 frames. ], batch size: 100, lr: 8.48e-03, grad_scale: 32.0 2023-12-22 04:02:19,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=391280.0, ans=0.125 2023-12-22 04:02:26,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-12-22 04:02:32,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-22 04:02:37,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-12-22 04:02:44,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=391413.3333333333, ans=0.0 2023-12-22 04:02:51,066 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.726e+01 2.853e+01 3.010e+01 3.854e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 04:03:06,646 INFO [train.py:886] (3/4) Epoch 13, batch 1550, loss[loss=0.01576, audio_tagging_loss=0.01576, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4950673.38 frames. ], batch size: 99, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:03:08,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=391613.3333333333, ans=0.0 2023-12-22 04:03:35,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=391746.6666666667, ans=0.125 2023-12-22 04:03:42,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=391813.3333333333, ans=0.125 2023-12-22 04:03:46,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.23 vs. limit=22.5 2023-12-22 04:03:57,485 INFO [train.py:886] (3/4) Epoch 13, batch 1600, loss[loss=0.01843, audio_tagging_loss=0.01843, over 25000.00 frames. ], tot_loss[loss=0.01483, audio_tagging_loss=0.01483, over 4945596.33 frames. ], batch size: 100, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:04:00,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-12-22 04:04:24,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392080.0, ans=0.1 2023-12-22 04:04:34,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.766e+01 2.914e+01 3.092e+01 3.457e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 04:04:36,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-12-22 04:04:50,742 INFO [train.py:886] (3/4) Epoch 13, batch 1650, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01479, audio_tagging_loss=0.01479, over 4940658.27 frames. ], batch size: 100, lr: 8.47e-03, grad_scale: 32.0 2023-12-22 04:05:04,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=392346.6666666667, ans=0.0 2023-12-22 04:05:16,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=392413.3333333333, ans=0.125 2023-12-22 04:05:32,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=392546.6666666667, ans=0.125 2023-12-22 04:05:41,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=392613.3333333333, ans=0.125 2023-12-22 04:05:42,349 INFO [train.py:886] (3/4) Epoch 13, batch 1700, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4943483.51 frames. ], batch size: 99, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:05:53,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=392680.0, ans=0.2 2023-12-22 04:06:07,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=392746.6666666667, ans=0.2 2023-12-22 04:06:10,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.43 vs. limit=22.5 2023-12-22 04:06:18,232 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.671e+01 2.847e+01 3.058e+01 3.677e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:06:25,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=392880.0, ans=0.125 2023-12-22 04:06:33,339 INFO [train.py:886] (3/4) Epoch 13, batch 1750, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4941227.54 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:06:49,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=393013.3333333333, ans=0.0 2023-12-22 04:07:09,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=393146.6666666667, ans=0.0 2023-12-22 04:07:25,436 INFO [train.py:886] (3/4) Epoch 13, batch 1800, loss[loss=0.01731, audio_tagging_loss=0.01731, over 25000.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4950079.12 frames. ], batch size: 100, lr: 8.46e-03, grad_scale: 32.0 2023-12-22 04:07:39,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=393346.6666666667, ans=0.125 2023-12-22 04:07:41,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2023-12-22 04:08:02,359 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.697e+01 2.832e+01 3.014e+01 3.728e+01, threshold=5.665e+01, percent-clipped=0.0 2023-12-22 04:08:03,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=393480.0, ans=0.0 2023-12-22 04:08:03,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=393480.0, ans=0.125 2023-12-22 04:08:17,471 INFO [train.py:886] (3/4) Epoch 13, batch 1850, loss[loss=0.01577, audio_tagging_loss=0.01577, over 25000.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4953270.84 frames. ], batch size: 100, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:08:17,676 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.364e-02 2023-12-22 04:08:18,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=393613.3333333333, ans=0.2 2023-12-22 04:08:40,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2023-12-22 04:08:57,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=393880.0, ans=0.0 2023-12-22 04:08:59,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=393880.0, ans=0.125 2023-12-22 04:08:59,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-12-22 04:09:02,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-22 04:09:09,392 INFO [train.py:886] (3/4) Epoch 13, batch 1900, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01477, audio_tagging_loss=0.01477, over 4945241.93 frames. ], batch size: 100, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:09:21,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=394013.3333333333, ans=0.125 2023-12-22 04:09:22,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394013.3333333333, ans=0.125 2023-12-22 04:09:26,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=394013.3333333333, ans=0.125 2023-12-22 04:09:45,142 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.721e+01 2.913e+01 3.026e+01 3.443e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 04:09:47,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-12-22 04:09:48,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=394146.6666666667, ans=0.0 2023-12-22 04:10:00,972 INFO [train.py:886] (3/4) Epoch 13, batch 1950, loss[loss=0.01649, audio_tagging_loss=0.01649, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4939377.40 frames. ], batch size: 99, lr: 8.45e-03, grad_scale: 32.0 2023-12-22 04:10:10,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=394280.0, ans=0.125 2023-12-22 04:10:21,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-12-22 04:10:37,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-12-22 04:10:46,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=394546.6666666667, ans=0.2 2023-12-22 04:10:51,751 INFO [train.py:886] (3/4) Epoch 13, batch 2000, loss[loss=0.01566, audio_tagging_loss=0.01566, over 21868.00 frames. ], tot_loss[loss=0.01464, audio_tagging_loss=0.01464, over 4940041.26 frames. ], batch size: 107, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:11:16,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=394746.6666666667, ans=0.0 2023-12-22 04:11:22,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=394813.3333333333, ans=0.125 2023-12-22 04:11:28,347 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.715e+01 2.846e+01 3.050e+01 3.536e+01, threshold=5.692e+01, percent-clipped=0.0 2023-12-22 04:11:38,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=394880.0, ans=0.09899494936611666 2023-12-22 04:11:41,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=394880.0, ans=0.0 2023-12-22 04:11:44,778 INFO [train.py:886] (3/4) Epoch 13, batch 2050, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4941424.17 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:11:49,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=394946.6666666667, ans=0.0 2023-12-22 04:11:54,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=395013.3333333333, ans=0.125 2023-12-22 04:11:57,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=395013.3333333333, ans=0.125 2023-12-22 04:11:59,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=395013.3333333333, ans=0.125 2023-12-22 04:12:18,078 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:12:20,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=395146.6666666667, ans=15.0 2023-12-22 04:12:35,670 INFO [train.py:886] (3/4) Epoch 13, batch 2100, loss[loss=0.01422, audio_tagging_loss=0.01422, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4942590.33 frames. ], batch size: 100, lr: 8.44e-03, grad_scale: 64.0 2023-12-22 04:12:42,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=395280.0, ans=0.0 2023-12-22 04:12:46,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-12-22 04:12:55,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=395346.6666666667, ans=0.0 2023-12-22 04:13:00,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-12-22 04:13:02,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395413.3333333333, ans=0.0 2023-12-22 04:13:12,414 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.725e+01 2.922e+01 3.047e+01 3.389e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 04:13:17,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=395546.6666666667, ans=0.0 2023-12-22 04:13:27,948 INFO [train.py:886] (3/4) Epoch 13, batch 2150, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4949248.65 frames. ], batch size: 99, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:13:29,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=395613.3333333333, ans=0.125 2023-12-22 04:13:31,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=395613.3333333333, ans=0.125 2023-12-22 04:13:41,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-12-22 04:13:51,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-12-22 04:14:19,321 INFO [train.py:886] (3/4) Epoch 13, batch 2200, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4944995.21 frames. ], batch size: 100, lr: 8.43e-03, grad_scale: 64.0 2023-12-22 04:14:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=395946.6666666667, ans=0.0 2023-12-22 04:14:26,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=395946.6666666667, ans=0.125 2023-12-22 04:14:35,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=396013.3333333333, ans=0.05 2023-12-22 04:14:48,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-22 04:14:56,168 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 2.696e+01 2.863e+01 2.990e+01 3.495e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 04:15:00,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=396213.3333333333, ans=0.125 2023-12-22 04:15:03,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=396213.3333333333, ans=0.125 2023-12-22 04:15:04,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=396213.3333333333, ans=0.0 2023-12-22 04:15:11,033 INFO [train.py:886] (3/4) Epoch 13, batch 2250, loss[loss=0.01784, audio_tagging_loss=0.01784, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4941150.73 frames. ], batch size: 99, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:15:15,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2023-12-22 04:15:20,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=396280.0, ans=0.125 2023-12-22 04:15:21,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-12-22 04:15:25,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=396346.6666666667, ans=0.125 2023-12-22 04:15:44,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=396480.0, ans=0.1 2023-12-22 04:16:03,309 INFO [train.py:886] (3/4) Epoch 13, batch 2300, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4945890.56 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:16:05,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396613.3333333333, ans=0.1 2023-12-22 04:16:15,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-12-22 04:16:19,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.74 vs. limit=22.5 2023-12-22 04:16:28,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-12-22 04:16:39,345 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.666e+01 2.800e+01 2.964e+01 3.386e+01, threshold=5.601e+01, percent-clipped=0.0 2023-12-22 04:16:52,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=396880.0, ans=0.125 2023-12-22 04:16:52,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=396880.0, ans=0.125 2023-12-22 04:16:55,664 INFO [train.py:886] (3/4) Epoch 13, batch 2350, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4949542.04 frames. ], batch size: 100, lr: 8.42e-03, grad_scale: 64.0 2023-12-22 04:17:23,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=397080.0, ans=0.125 2023-12-22 04:17:46,556 INFO [train.py:886] (3/4) Epoch 13, batch 2400, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4943506.39 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:17:51,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=397280.0, ans=0.125 2023-12-22 04:17:52,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-22 04:17:54,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397280.0, ans=0.1 2023-12-22 04:18:07,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=397413.3333333333, ans=0.09899494936611666 2023-12-22 04:18:11,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.93 vs. limit=22.5 2023-12-22 04:18:12,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=397413.3333333333, ans=0.0 2023-12-22 04:18:15,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397413.3333333333, ans=0.1 2023-12-22 04:18:23,443 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.690e+01 2.809e+01 2.996e+01 3.440e+01, threshold=5.617e+01, percent-clipped=0.0 2023-12-22 04:18:39,164 INFO [train.py:886] (3/4) Epoch 13, batch 2450, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4950163.66 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:18:46,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=397613.3333333333, ans=0.125 2023-12-22 04:19:08,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=397746.6666666667, ans=0.125 2023-12-22 04:19:24,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=397880.0, ans=0.0 2023-12-22 04:19:30,697 INFO [train.py:886] (3/4) Epoch 13, batch 2500, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4951174.15 frames. ], batch size: 100, lr: 8.41e-03, grad_scale: 64.0 2023-12-22 04:19:31,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=397946.6666666667, ans=0.125 2023-12-22 04:19:37,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=397946.6666666667, ans=0.05 2023-12-22 04:19:46,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.06 vs. limit=22.5 2023-12-22 04:20:07,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-22 04:20:07,967 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.733e+01 2.874e+01 2.990e+01 3.625e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 04:20:13,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2023-12-22 04:20:14,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=398213.3333333333, ans=0.125 2023-12-22 04:20:23,080 INFO [train.py:886] (3/4) Epoch 13, batch 2550, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4951872.75 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:20:26,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398280.0, ans=0.1 2023-12-22 04:20:26,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=398280.0, ans=0.04949747468305833 2023-12-22 04:20:29,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=398280.0, ans=0.125 2023-12-22 04:21:15,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=398613.3333333333, ans=0.0 2023-12-22 04:21:15,689 INFO [train.py:886] (3/4) Epoch 13, batch 2600, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4952773.88 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:21:15,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=398613.3333333333, ans=0.125 2023-12-22 04:21:35,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=398746.6666666667, ans=0.125 2023-12-22 04:21:36,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=398746.6666666667, ans=0.0 2023-12-22 04:21:47,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 04:21:50,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=398813.3333333333, ans=0.125 2023-12-22 04:21:51,208 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.732e+01 2.842e+01 3.020e+01 3.869e+01, threshold=5.684e+01, percent-clipped=0.0 2023-12-22 04:22:05,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-12-22 04:22:06,044 INFO [train.py:886] (3/4) Epoch 13, batch 2650, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4950622.20 frames. ], batch size: 100, lr: 8.40e-03, grad_scale: 64.0 2023-12-22 04:22:08,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=398946.6666666667, ans=0.0 2023-12-22 04:22:09,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=398946.6666666667, ans=0.0 2023-12-22 04:22:25,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-12-22 04:22:28,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-22 04:22:30,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399080.0, ans=0.125 2023-12-22 04:22:47,410 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.129e-01 2023-12-22 04:22:58,323 INFO [train.py:886] (3/4) Epoch 13, batch 2700, loss[loss=0.01627, audio_tagging_loss=0.01627, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4951509.00 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:00,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399280.0, ans=0.1 2023-12-22 04:23:12,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=399346.6666666667, ans=0.0 2023-12-22 04:23:15,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=12.0 2023-12-22 04:23:33,776 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+01 2.672e+01 2.778e+01 2.950e+01 3.329e+01, threshold=5.555e+01, percent-clipped=0.0 2023-12-22 04:23:34,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=399480.0, ans=0.2 2023-12-22 04:23:47,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399613.3333333333, ans=0.125 2023-12-22 04:23:48,674 INFO [train.py:886] (3/4) Epoch 13, batch 2750, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4956715.71 frames. ], batch size: 100, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:23:57,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=399613.3333333333, ans=0.125 2023-12-22 04:23:58,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=399680.0, ans=0.0 2023-12-22 04:23:59,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=399680.0, ans=0.0 2023-12-22 04:24:04,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399680.0, ans=0.125 2023-12-22 04:24:10,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=399746.6666666667, ans=0.125 2023-12-22 04:24:19,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=399813.3333333333, ans=0.125 2023-12-22 04:24:26,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2023-12-22 04:24:29,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-12-22 04:24:33,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=399880.0, ans=0.125 2023-12-22 04:24:33,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2023-12-22 04:24:38,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=399880.0, ans=0.125 2023-12-22 04:24:40,132 INFO [train.py:886] (3/4) Epoch 13, batch 2800, loss[loss=0.01566, audio_tagging_loss=0.01566, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4954707.02 frames. ], batch size: 99, lr: 8.39e-03, grad_scale: 64.0 2023-12-22 04:24:52,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=400013.3333333333, ans=0.2 2023-12-22 04:24:53,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=400013.3333333333, ans=0.125 2023-12-22 04:25:08,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:25:08,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=400080.0, ans=0.0 2023-12-22 04:25:18,178 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.709e+01 2.865e+01 2.998e+01 3.603e+01, threshold=5.731e+01, percent-clipped=0.0 2023-12-22 04:25:20,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=400146.6666666667, ans=0.125 2023-12-22 04:25:28,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-12-22 04:25:32,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-22 04:25:33,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=400280.0, ans=0.0 2023-12-22 04:25:34,353 INFO [train.py:886] (3/4) Epoch 13, batch 2850, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4952794.13 frames. ], batch size: 100, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:25:42,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-12-22 04:25:51,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-22 04:26:08,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=400480.0, ans=0.0 2023-12-22 04:26:11,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.47 vs. limit=22.5 2023-12-22 04:26:12,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=400480.0, ans=0.125 2023-12-22 04:26:25,186 INFO [train.py:886] (3/4) Epoch 13, batch 2900, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4948797.32 frames. ], batch size: 100, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:26:29,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-12-22 04:26:33,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=400613.3333333333, ans=0.04949747468305833 2023-12-22 04:26:43,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-12-22 04:26:46,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=400746.6666666667, ans=0.025 2023-12-22 04:26:48,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-22 04:27:01,892 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.689e+01 2.815e+01 2.998e+01 3.858e+01, threshold=5.630e+01, percent-clipped=0.0 2023-12-22 04:27:09,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=400880.0, ans=0.0 2023-12-22 04:27:17,533 INFO [train.py:886] (3/4) Epoch 13, batch 2950, loss[loss=0.01616, audio_tagging_loss=0.01616, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4953994.91 frames. ], batch size: 100, lr: 8.38e-03, grad_scale: 64.0 2023-12-22 04:27:17,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=400946.6666666667, ans=0.125 2023-12-22 04:27:18,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.10 vs. limit=10.0 2023-12-22 04:27:21,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=400946.6666666667, ans=0.0 2023-12-22 04:27:24,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=400946.6666666667, ans=0.125 2023-12-22 04:27:29,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=401013.3333333333, ans=0.125 2023-12-22 04:27:29,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=401013.3333333333, ans=0.0 2023-12-22 04:27:31,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 04:27:31,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-12-22 04:27:37,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=401080.0, ans=0.125 2023-12-22 04:27:42,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=401080.0, ans=10.0 2023-12-22 04:27:42,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=401080.0, ans=0.04949747468305833 2023-12-22 04:27:50,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-22 04:28:07,807 INFO [train.py:886] (3/4) Epoch 13, batch 3000, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4954040.05 frames. ], batch size: 100, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:28:07,807 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 04:28:25,954 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1754, 2.9163, 4.0072, 3.8094], device='cuda:3') 2023-12-22 04:28:28,500 INFO [train.py:917] (3/4) Epoch 13, validation: loss=0.03396, audio_tagging_loss=0.03396, over 3737520.00 frames. 2023-12-22 04:28:28,501 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 04:28:32,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=401280.0, ans=0.2 2023-12-22 04:28:47,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=401346.6666666667, ans=0.2 2023-12-22 04:28:48,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=401413.3333333333, ans=0.0 2023-12-22 04:29:04,376 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.650e+01 2.784e+01 2.965e+01 3.758e+01, threshold=5.568e+01, percent-clipped=0.0 2023-12-22 04:29:05,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=401480.0, ans=10.0 2023-12-22 04:29:20,105 INFO [train.py:886] (3/4) Epoch 13, batch 3050, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4956393.05 frames. ], batch size: 100, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:29:28,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=401680.0, ans=0.2 2023-12-22 04:29:46,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=401746.6666666667, ans=0.125 2023-12-22 04:29:50,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401813.3333333333, ans=0.125 2023-12-22 04:30:07,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=401880.0, ans=0.125 2023-12-22 04:30:10,215 INFO [train.py:886] (3/4) Epoch 13, batch 3100, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4961694.16 frames. ], batch size: 99, lr: 8.37e-03, grad_scale: 64.0 2023-12-22 04:30:37,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=402080.0, ans=0.0 2023-12-22 04:30:46,093 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.729e+01 2.844e+01 2.954e+01 3.475e+01, threshold=5.688e+01, percent-clipped=0.0 2023-12-22 04:30:56,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=402213.3333333333, ans=0.125 2023-12-22 04:31:00,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402213.3333333333, ans=0.1 2023-12-22 04:31:01,789 INFO [train.py:886] (3/4) Epoch 13, batch 3150, loss[loss=0.01496, audio_tagging_loss=0.01496, over 22903.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4954123.07 frames. ], batch size: 107, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:04,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=402280.0, ans=0.0 2023-12-22 04:31:04,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=402280.0, ans=0.95 2023-12-22 04:31:11,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=402346.6666666667, ans=0.125 2023-12-22 04:31:16,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-22 04:31:17,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-12-22 04:31:23,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=402413.3333333333, ans=0.0 2023-12-22 04:31:48,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=402546.6666666667, ans=0.0 2023-12-22 04:31:52,763 INFO [train.py:886] (3/4) Epoch 13, batch 3200, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4952274.86 frames. ], batch size: 99, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:31:54,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=402613.3333333333, ans=0.125 2023-12-22 04:32:12,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=402680.0, ans=0.2 2023-12-22 04:32:14,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=402746.6666666667, ans=0.2 2023-12-22 04:32:21,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=402746.6666666667, ans=0.0 2023-12-22 04:32:26,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=12.0 2023-12-22 04:32:27,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=402813.3333333333, ans=0.0 2023-12-22 04:32:29,647 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.680e+01 2.806e+01 3.000e+01 3.606e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 04:32:30,920 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:32:36,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=402880.0, ans=0.0 2023-12-22 04:32:45,293 INFO [train.py:886] (3/4) Epoch 13, batch 3250, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4949973.77 frames. ], batch size: 99, lr: 8.36e-03, grad_scale: 64.0 2023-12-22 04:32:46,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=402946.6666666667, ans=0.0 2023-12-22 04:32:59,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-22 04:33:04,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=403013.3333333333, ans=0.0 2023-12-22 04:33:08,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=403080.0, ans=0.04949747468305833 2023-12-22 04:33:12,747 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:33:15,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403080.0, ans=0.1 2023-12-22 04:33:37,338 INFO [train.py:886] (3/4) Epoch 13, batch 3300, loss[loss=0.01517, audio_tagging_loss=0.01517, over 24908.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4954147.98 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:33:55,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=403346.6666666667, ans=0.0 2023-12-22 04:34:05,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-12-22 04:34:13,610 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+01 2.672e+01 2.806e+01 3.015e+01 3.427e+01, threshold=5.612e+01, percent-clipped=0.0 2023-12-22 04:34:29,309 INFO [train.py:886] (3/4) Epoch 13, batch 3350, loss[loss=0.01333, audio_tagging_loss=0.01333, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4949596.41 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:34:29,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=403613.3333333333, ans=0.0 2023-12-22 04:34:39,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=403680.0, ans=0.125 2023-12-22 04:34:49,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-22 04:34:54,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.72 vs. limit=22.5 2023-12-22 04:35:12,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=403880.0, ans=0.1 2023-12-22 04:35:17,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-12-22 04:35:20,733 INFO [train.py:886] (3/4) Epoch 13, batch 3400, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4947758.79 frames. ], batch size: 100, lr: 8.35e-03, grad_scale: 64.0 2023-12-22 04:35:23,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-12-22 04:35:41,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=404080.0, ans=0.125 2023-12-22 04:35:50,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-12-22 04:35:56,796 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.740e+01 2.906e+01 3.028e+01 3.470e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 04:36:06,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404213.3333333333, ans=0.1 2023-12-22 04:36:11,663 INFO [train.py:886] (3/4) Epoch 13, batch 3450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4938820.08 frames. ], batch size: 99, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:36:12,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=404280.0, ans=0.125 2023-12-22 04:37:03,787 INFO [train.py:886] (3/4) Epoch 13, batch 3500, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4938118.59 frames. ], batch size: 99, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:37:14,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.53 vs. limit=22.5 2023-12-22 04:37:37,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2023-12-22 04:37:39,806 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.687e+01 2.886e+01 3.053e+01 3.392e+01, threshold=5.771e+01, percent-clipped=0.0 2023-12-22 04:37:55,460 INFO [train.py:886] (3/4) Epoch 13, batch 3550, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4945583.02 frames. ], batch size: 100, lr: 8.34e-03, grad_scale: 64.0 2023-12-22 04:37:58,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404946.6666666667, ans=0.125 2023-12-22 04:38:10,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=405013.3333333333, ans=0.0 2023-12-22 04:38:37,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=405213.3333333333, ans=0.125 2023-12-22 04:38:41,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-12-22 04:38:42,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=405213.3333333333, ans=0.125 2023-12-22 04:38:47,564 INFO [train.py:886] (3/4) Epoch 13, batch 3600, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24931.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4947552.51 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:38:50,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=405280.0, ans=0.2 2023-12-22 04:39:23,611 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.719e+01 2.868e+01 3.005e+01 3.537e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 04:39:35,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=405546.6666666667, ans=0.125 2023-12-22 04:39:39,981 INFO [train.py:886] (3/4) Epoch 13, batch 3650, loss[loss=0.01583, audio_tagging_loss=0.01583, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4954077.16 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:39:50,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=405680.0, ans=0.0 2023-12-22 04:39:53,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=405680.0, ans=0.0 2023-12-22 04:40:01,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=405746.6666666667, ans=0.2 2023-12-22 04:40:12,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-22 04:40:30,630 INFO [train.py:886] (3/4) Epoch 13, batch 3700, loss[loss=0.0159, audio_tagging_loss=0.0159, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4958230.16 frames. ], batch size: 100, lr: 8.33e-03, grad_scale: 64.0 2023-12-22 04:40:47,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-12-22 04:41:06,770 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.738e+01 2.839e+01 2.971e+01 4.016e+01, threshold=5.677e+01, percent-clipped=0.0 2023-12-22 04:41:07,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=406146.6666666667, ans=0.07 2023-12-22 04:41:14,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-22 04:41:22,594 INFO [train.py:886] (3/4) Epoch 13, batch 3750, loss[loss=0.01514, audio_tagging_loss=0.01514, over 24750.00 frames. ], tot_loss[loss=0.01459, audio_tagging_loss=0.01459, over 4953927.91 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:41:36,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=406346.6666666667, ans=0.025 2023-12-22 04:41:39,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=406346.6666666667, ans=0.125 2023-12-22 04:41:47,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=406413.3333333333, ans=0.0 2023-12-22 04:41:53,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406480.0, ans=0.1 2023-12-22 04:42:03,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=12.0 2023-12-22 04:42:12,666 INFO [train.py:886] (3/4) Epoch 13, batch 3800, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01468, audio_tagging_loss=0.01468, over 4948114.67 frames. ], batch size: 99, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:42:20,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2023-12-22 04:42:49,592 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.729e+01 2.855e+01 2.947e+01 3.530e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 04:43:05,320 INFO [train.py:886] (3/4) Epoch 13, batch 3850, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4948696.02 frames. ], batch size: 100, lr: 8.32e-03, grad_scale: 64.0 2023-12-22 04:43:13,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=406946.6666666667, ans=0.0 2023-12-22 04:43:15,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=407013.3333333333, ans=0.125 2023-12-22 04:43:28,532 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:43:32,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=407080.0, ans=0.0 2023-12-22 04:43:32,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=407080.0, ans=0.0 2023-12-22 04:43:35,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=407146.6666666667, ans=0.125 2023-12-22 04:43:44,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407146.6666666667, ans=0.125 2023-12-22 04:43:57,684 INFO [train.py:886] (3/4) Epoch 13, batch 3900, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4953879.65 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:44:22,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=407413.3333333333, ans=0.0 2023-12-22 04:44:24,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=407413.3333333333, ans=15.0 2023-12-22 04:44:33,656 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+01 2.742e+01 2.835e+01 2.926e+01 3.497e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 04:44:46,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=407546.6666666667, ans=0.2 2023-12-22 04:44:48,706 INFO [train.py:886] (3/4) Epoch 13, batch 3950, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4953586.01 frames. ], batch size: 100, lr: 8.31e-03, grad_scale: 64.0 2023-12-22 04:44:52,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=407613.3333333333, ans=0.0 2023-12-22 04:44:54,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-12-22 04:44:59,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2023-12-22 04:45:01,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=407680.0, ans=0.125 2023-12-22 04:45:02,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=407680.0, ans=0.125 2023-12-22 04:45:15,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=407746.6666666667, ans=0.0 2023-12-22 04:45:15,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=407746.6666666667, ans=0.025 2023-12-22 04:45:33,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=407880.0, ans=0.2 2023-12-22 04:45:41,174 INFO [train.py:886] (3/4) Epoch 13, batch 4000, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4954336.58 frames. ], batch size: 99, lr: 8.31e-03, grad_scale: 128.0 2023-12-22 04:45:56,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=408013.3333333333, ans=0.125 2023-12-22 04:46:01,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=408080.0, ans=0.125 2023-12-22 04:46:05,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=408080.0, ans=0.125 2023-12-22 04:46:18,848 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+01 2.770e+01 2.894e+01 3.015e+01 3.399e+01, threshold=5.788e+01, percent-clipped=0.0 2023-12-22 04:46:21,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2023-12-22 04:46:22,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=408213.3333333333, ans=0.125 2023-12-22 04:46:26,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-22 04:46:30,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=408213.3333333333, ans=0.125 2023-12-22 04:46:32,153 INFO [train.py:886] (3/4) Epoch 13, batch 4050, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.0146, audio_tagging_loss=0.0146, over 4958832.80 frames. ], batch size: 100, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:46:54,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=408413.3333333333, ans=0.0 2023-12-22 04:46:59,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=408413.3333333333, ans=0.125 2023-12-22 04:47:05,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=408480.0, ans=0.2 2023-12-22 04:47:16,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=408546.6666666667, ans=0.5 2023-12-22 04:47:24,294 INFO [train.py:886] (3/4) Epoch 13, batch 4100, loss[loss=0.0143, audio_tagging_loss=0.0143, over 24750.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 4952653.38 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:47:24,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=408613.3333333333, ans=0.125 2023-12-22 04:47:26,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=408613.3333333333, ans=0.0 2023-12-22 04:47:51,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-12-22 04:48:00,114 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.711e+01 2.874e+01 3.032e+01 3.484e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 04:48:03,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=408880.0, ans=0.0 2023-12-22 04:48:04,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=408880.0, ans=0.0 2023-12-22 04:48:05,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=408880.0, ans=0.125 2023-12-22 04:48:13,963 INFO [train.py:886] (3/4) Epoch 13, batch 4150, loss[loss=0.01682, audio_tagging_loss=0.01682, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4946526.37 frames. ], batch size: 99, lr: 8.30e-03, grad_scale: 64.0 2023-12-22 04:48:15,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=408946.6666666667, ans=0.07 2023-12-22 04:48:30,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=409013.3333333333, ans=0.125 2023-12-22 04:48:32,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=409080.0, ans=0.125 2023-12-22 04:48:38,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=409080.0, ans=0.125 2023-12-22 04:48:39,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-12-22 04:48:52,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=409146.6666666667, ans=0.025 2023-12-22 04:49:03,484 INFO [train.py:886] (3/4) Epoch 13, batch 4200, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4950211.10 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:49:11,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=409280.0, ans=0.2 2023-12-22 04:49:30,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-22 04:49:38,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-12-22 04:49:39,681 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.633e+01 2.827e+01 2.966e+01 3.504e+01, threshold=5.654e+01, percent-clipped=0.0 2023-12-22 04:49:55,216 INFO [train.py:886] (3/4) Epoch 13, batch 4250, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4949150.61 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:50:03,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-12-22 04:50:16,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409746.6666666667, ans=0.1 2023-12-22 04:50:28,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=409813.3333333333, ans=0.0 2023-12-22 04:50:42,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409880.0, ans=0.1 2023-12-22 04:50:45,476 INFO [train.py:886] (3/4) Epoch 13, batch 4300, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4950605.58 frames. ], batch size: 100, lr: 8.29e-03, grad_scale: 64.0 2023-12-22 04:51:04,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=410013.3333333333, ans=0.0 2023-12-22 04:51:21,866 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 2.664e+01 2.847e+01 3.002e+01 3.748e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 04:51:22,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410146.6666666667, ans=0.1 2023-12-22 04:51:25,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=410213.3333333333, ans=0.2 2023-12-22 04:51:27,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=410213.3333333333, ans=0.0 2023-12-22 04:51:27,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=410213.3333333333, ans=0.125 2023-12-22 04:51:28,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 04:51:36,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-12-22 04:51:36,650 INFO [train.py:886] (3/4) Epoch 13, batch 4350, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4951481.21 frames. ], batch size: 100, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:51:52,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=410346.6666666667, ans=0.1 2023-12-22 04:51:56,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=410413.3333333333, ans=0.2 2023-12-22 04:51:56,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=410413.3333333333, ans=0.2 2023-12-22 04:51:57,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=410413.3333333333, ans=0.0 2023-12-22 04:52:03,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=410413.3333333333, ans=0.125 2023-12-22 04:52:18,155 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 04:52:27,233 INFO [train.py:886] (3/4) Epoch 13, batch 4400, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4942128.87 frames. ], batch size: 99, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:52:28,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410613.3333333333, ans=0.1 2023-12-22 04:52:32,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=410613.3333333333, ans=0.125 2023-12-22 04:52:38,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-12-22 04:52:40,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=410680.0, ans=0.125 2023-12-22 04:52:48,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410746.6666666667, ans=0.125 2023-12-22 04:52:54,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=410746.6666666667, ans=0.125 2023-12-22 04:53:05,343 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.774e+01 2.948e+01 3.077e+01 3.816e+01, threshold=5.896e+01, percent-clipped=0.0 2023-12-22 04:53:12,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=410880.0, ans=0.125 2023-12-22 04:53:19,285 INFO [train.py:886] (3/4) Epoch 13, batch 4450, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4939483.88 frames. ], batch size: 100, lr: 8.28e-03, grad_scale: 64.0 2023-12-22 04:53:24,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=410946.6666666667, ans=0.0 2023-12-22 04:53:26,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=410946.6666666667, ans=0.125 2023-12-22 04:53:27,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=410946.6666666667, ans=0.0 2023-12-22 04:53:31,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=411013.3333333333, ans=0.05 2023-12-22 04:53:38,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2023-12-22 04:53:45,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=411080.0, ans=0.125 2023-12-22 04:53:49,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.996e-02 2023-12-22 04:53:50,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=411146.6666666667, ans=0.0 2023-12-22 04:54:01,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=411213.3333333333, ans=0.125 2023-12-22 04:54:10,064 INFO [train.py:886] (3/4) Epoch 13, batch 4500, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4939686.61 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:54:18,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=411280.0, ans=0.0 2023-12-22 04:54:47,746 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.626e+01 2.809e+01 2.913e+01 3.485e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 04:54:50,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=411546.6666666667, ans=0.2 2023-12-22 04:54:51,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=411546.6666666667, ans=0.05 2023-12-22 04:55:02,377 INFO [train.py:886] (3/4) Epoch 13, batch 4550, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4944282.34 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:55:09,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-22 04:55:14,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-12-22 04:55:14,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2023-12-22 04:55:33,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.02 vs. limit=22.5 2023-12-22 04:55:35,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=411813.3333333333, ans=0.125 2023-12-22 04:55:48,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-12-22 04:55:51,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2023-12-22 04:55:53,109 INFO [train.py:886] (3/4) Epoch 13, batch 4600, loss[loss=0.01853, audio_tagging_loss=0.01853, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4953638.73 frames. ], batch size: 100, lr: 8.27e-03, grad_scale: 64.0 2023-12-22 04:56:04,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2023-12-22 04:56:04,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412013.3333333333, ans=0.1 2023-12-22 04:56:04,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412013.3333333333, ans=0.1 2023-12-22 04:56:05,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=412013.3333333333, ans=0.0 2023-12-22 04:56:17,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=412080.0, ans=0.09899494936611666 2023-12-22 04:56:30,631 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.702e+01 2.812e+01 2.973e+01 3.804e+01, threshold=5.623e+01, percent-clipped=0.0 2023-12-22 04:56:38,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=412213.3333333333, ans=0.2 2023-12-22 04:56:38,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412213.3333333333, ans=0.1 2023-12-22 04:56:40,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=412213.3333333333, ans=0.125 2023-12-22 04:56:45,136 INFO [train.py:886] (3/4) Epoch 13, batch 4650, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4958006.48 frames. ], batch size: 100, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:57:04,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2023-12-22 04:57:17,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.12 vs. limit=22.5 2023-12-22 04:57:27,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=412546.6666666667, ans=0.125 2023-12-22 04:57:35,744 INFO [train.py:886] (3/4) Epoch 13, batch 4700, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4955811.30 frames. ], batch size: 99, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:57:35,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=412613.3333333333, ans=0.125 2023-12-22 04:57:47,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=412680.0, ans=0.0 2023-12-22 04:57:59,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=412746.6666666667, ans=0.2 2023-12-22 04:58:10,162 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.759e+01 2.926e+01 3.098e+01 3.768e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 04:58:13,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=412880.0, ans=0.125 2023-12-22 04:58:23,358 INFO [train.py:886] (3/4) Epoch 13, batch 4750, loss[loss=0.01555, audio_tagging_loss=0.01555, over 24750.00 frames. ], tot_loss[loss=0.01466, audio_tagging_loss=0.01466, over 4952730.08 frames. ], batch size: 99, lr: 8.26e-03, grad_scale: 64.0 2023-12-22 04:58:29,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=412946.6666666667, ans=0.125 2023-12-22 04:59:00,179 INFO [train.py:886] (3/4) Epoch 14, batch 0, loss[loss=0.03235, audio_tagging_loss=0.03235, over 25000.00 frames. ], tot_loss[loss=0.03235, audio_tagging_loss=0.03235, over 25000.00 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 04:59:00,179 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 04:59:20,850 INFO [train.py:917] (3/4) Epoch 14, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 04:59:20,851 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 04:59:24,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2023-12-22 04:59:26,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-12-22 04:59:28,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=413053.3333333333, ans=0.125 2023-12-22 04:59:30,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2023-12-22 04:59:35,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=413120.0, ans=0.0 2023-12-22 04:59:41,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-12-22 04:59:51,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-12-22 04:59:58,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.96 vs. limit=15.0 2023-12-22 05:00:05,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-12-22 05:00:09,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=413320.0, ans=0.0 2023-12-22 05:00:13,994 INFO [train.py:886] (3/4) Epoch 14, batch 50, loss[loss=0.01619, audio_tagging_loss=0.01619, over 24077.00 frames. ], tot_loss[loss=0.02324, audio_tagging_loss=0.02324, over 1123064.02 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:00:30,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=413453.3333333333, ans=0.0 2023-12-22 05:00:33,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=12.0 2023-12-22 05:00:33,569 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.957e+01 3.426e+01 4.066e+01 1.021e+02, threshold=6.852e+01, percent-clipped=7.0 2023-12-22 05:00:41,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413520.0, ans=0.125 2023-12-22 05:00:50,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2023-12-22 05:00:58,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=413653.3333333333, ans=0.2 2023-12-22 05:01:04,576 INFO [train.py:886] (3/4) Epoch 14, batch 100, loss[loss=0.01498, audio_tagging_loss=0.01498, over 25000.00 frames. ], tot_loss[loss=0.01987, audio_tagging_loss=0.01987, over 1977269.15 frames. ], batch size: 100, lr: 7.95e-03, grad_scale: 64.0 2023-12-22 05:01:14,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=413786.6666666667, ans=0.125 2023-12-22 05:01:15,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-12-22 05:01:53,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=413986.6666666667, ans=0.07 2023-12-22 05:01:56,686 INFO [train.py:886] (3/4) Epoch 14, batch 150, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01818, audio_tagging_loss=0.01818, over 2628552.71 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:02:01,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=414053.3333333333, ans=0.125 2023-12-22 05:02:14,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=414120.0, ans=0.1 2023-12-22 05:02:16,261 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.861e+01 3.025e+01 3.235e+01 3.410e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 05:02:17,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-12-22 05:02:38,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=414320.0, ans=0.015 2023-12-22 05:02:41,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=414320.0, ans=0.0 2023-12-22 05:02:42,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=414320.0, ans=0.025 2023-12-22 05:02:47,412 INFO [train.py:886] (3/4) Epoch 14, batch 200, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01717, audio_tagging_loss=0.01717, over 3147554.59 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:02:47,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414386.6666666667, ans=0.1 2023-12-22 05:02:56,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=414386.6666666667, ans=0.0 2023-12-22 05:03:03,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-12-22 05:03:40,537 INFO [train.py:886] (3/4) Epoch 14, batch 250, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01631, audio_tagging_loss=0.01631, over 3553632.16 frames. ], batch size: 100, lr: 7.94e-03, grad_scale: 64.0 2023-12-22 05:03:42,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=414720.0, ans=0.125 2023-12-22 05:03:45,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-12-22 05:03:47,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=414720.0, ans=0.125 2023-12-22 05:04:00,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.722e+01 2.891e+01 3.020e+01 3.428e+01, threshold=5.782e+01, percent-clipped=0.0 2023-12-22 05:04:12,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2023-12-22 05:04:17,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=414920.0, ans=0.0 2023-12-22 05:04:31,627 INFO [train.py:886] (3/4) Epoch 14, batch 300, loss[loss=0.01531, audio_tagging_loss=0.01531, over 24750.00 frames. ], tot_loss[loss=0.01585, audio_tagging_loss=0.01585, over 3863985.86 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:04:37,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=415053.3333333333, ans=0.125 2023-12-22 05:04:42,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=415120.0, ans=0.09899494936611666 2023-12-22 05:04:46,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2023-12-22 05:05:04,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=415253.3333333333, ans=0.125 2023-12-22 05:05:23,728 INFO [train.py:886] (3/4) Epoch 14, batch 350, loss[loss=0.0164, audio_tagging_loss=0.0164, over 24750.00 frames. ], tot_loss[loss=0.01557, audio_tagging_loss=0.01557, over 4102297.00 frames. ], batch size: 99, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:05:28,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=415386.6666666667, ans=0.0 2023-12-22 05:05:44,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.760e+01 2.874e+01 3.045e+01 3.726e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 05:05:51,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=12.0 2023-12-22 05:05:53,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-12-22 05:05:56,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-22 05:06:10,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=15.0 2023-12-22 05:06:11,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=415653.3333333333, ans=15.0 2023-12-22 05:06:15,483 INFO [train.py:886] (3/4) Epoch 14, batch 400, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01518, audio_tagging_loss=0.01518, over 4290741.42 frames. ], batch size: 100, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:06:18,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-12-22 05:06:42,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=415853.3333333333, ans=0.04949747468305833 2023-12-22 05:06:54,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=415920.0, ans=0.05 2023-12-22 05:06:54,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=15.0 2023-12-22 05:06:55,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=415920.0, ans=0.2 2023-12-22 05:07:07,890 INFO [train.py:886] (3/4) Epoch 14, batch 450, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01495, audio_tagging_loss=0.01495, over 4439106.92 frames. ], batch size: 100, lr: 7.93e-03, grad_scale: 64.0 2023-12-22 05:07:10,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=416053.3333333333, ans=0.07 2023-12-22 05:07:23,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=416120.0, ans=0.025 2023-12-22 05:07:28,352 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.661e+01 2.801e+01 2.949e+01 3.337e+01, threshold=5.602e+01, percent-clipped=0.0 2023-12-22 05:07:36,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=416253.3333333333, ans=0.2 2023-12-22 05:07:59,894 INFO [train.py:886] (3/4) Epoch 14, batch 500, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4558422.83 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:07,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=416386.6666666667, ans=10.0 2023-12-22 05:08:18,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-12-22 05:08:18,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=416453.3333333333, ans=0.125 2023-12-22 05:08:48,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=416653.3333333333, ans=0.0 2023-12-22 05:08:51,489 INFO [train.py:886] (3/4) Epoch 14, batch 550, loss[loss=0.01669, audio_tagging_loss=0.01669, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4644260.44 frames. ], batch size: 100, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:08:57,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=416720.0, ans=0.02 2023-12-22 05:09:11,909 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.686e+01 2.777e+01 2.975e+01 3.433e+01, threshold=5.554e+01, percent-clipped=0.0 2023-12-22 05:09:22,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=416920.0, ans=0.125 2023-12-22 05:09:23,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=416920.0, ans=0.025 2023-12-22 05:09:23,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=416920.0, ans=0.02 2023-12-22 05:09:31,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2023-12-22 05:09:43,194 INFO [train.py:886] (3/4) Epoch 14, batch 600, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24750.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4714430.99 frames. ], batch size: 99, lr: 7.92e-03, grad_scale: 64.0 2023-12-22 05:10:10,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=417186.6666666667, ans=0.125 2023-12-22 05:10:18,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417253.3333333333, ans=0.125 2023-12-22 05:10:27,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=417320.0, ans=0.125 2023-12-22 05:10:34,865 INFO [train.py:886] (3/4) Epoch 14, batch 650, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.0147, audio_tagging_loss=0.0147, over 4760794.84 frames. ], batch size: 99, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:10:47,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=417453.3333333333, ans=0.125 2023-12-22 05:10:56,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.731e+01 2.888e+01 3.018e+01 3.671e+01, threshold=5.777e+01, percent-clipped=0.0 2023-12-22 05:11:01,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=417520.0, ans=0.0 2023-12-22 05:11:12,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=417586.6666666667, ans=0.125 2023-12-22 05:11:27,245 INFO [train.py:886] (3/4) Epoch 14, batch 700, loss[loss=0.01676, audio_tagging_loss=0.01676, over 25000.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4798576.97 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:11:30,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=417720.0, ans=0.0 2023-12-22 05:11:38,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=417786.6666666667, ans=0.125 2023-12-22 05:11:41,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=417786.6666666667, ans=0.125 2023-12-22 05:11:43,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=417786.6666666667, ans=0.2 2023-12-22 05:11:43,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=417786.6666666667, ans=0.2 2023-12-22 05:11:49,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=417853.3333333333, ans=10.0 2023-12-22 05:12:08,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417986.6666666667, ans=0.1 2023-12-22 05:12:15,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=417986.6666666667, ans=0.0 2023-12-22 05:12:16,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417986.6666666667, ans=0.1 2023-12-22 05:12:18,770 INFO [train.py:886] (3/4) Epoch 14, batch 750, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24060.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4833607.69 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 64.0 2023-12-22 05:12:19,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-12-22 05:12:24,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=418053.3333333333, ans=0.2 2023-12-22 05:12:29,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=418120.0, ans=0.125 2023-12-22 05:12:39,814 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.423e+01 2.688e+01 2.814e+01 2.981e+01 3.514e+01, threshold=5.628e+01, percent-clipped=0.0 2023-12-22 05:12:58,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=418253.3333333333, ans=0.125 2023-12-22 05:13:02,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=418320.0, ans=0.125 2023-12-22 05:13:11,009 INFO [train.py:886] (3/4) Epoch 14, batch 800, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4865461.32 frames. ], batch size: 100, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:13:19,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=12.0 2023-12-22 05:13:31,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=418520.0, ans=0.125 2023-12-22 05:13:51,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 05:14:02,730 INFO [train.py:886] (3/4) Epoch 14, batch 850, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4890051.37 frames. ], batch size: 100, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:19,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-12-22 05:14:23,682 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.411e+01 2.718e+01 2.822e+01 2.961e+01 3.534e+01, threshold=5.644e+01, percent-clipped=0.0 2023-12-22 05:14:24,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=418853.3333333333, ans=0.125 2023-12-22 05:14:28,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-12-22 05:14:54,311 INFO [train.py:886] (3/4) Epoch 14, batch 900, loss[loss=0.01595, audio_tagging_loss=0.01595, over 24750.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 4904756.53 frames. ], batch size: 99, lr: 7.90e-03, grad_scale: 64.0 2023-12-22 05:14:54,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-12-22 05:14:56,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=419053.3333333333, ans=0.2 2023-12-22 05:15:12,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=419120.0, ans=0.0 2023-12-22 05:15:29,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=419253.3333333333, ans=0.0 2023-12-22 05:15:31,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=419253.3333333333, ans=0.0 2023-12-22 05:15:40,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=419320.0, ans=0.0 2023-12-22 05:15:46,876 INFO [train.py:886] (3/4) Epoch 14, batch 950, loss[loss=0.01555, audio_tagging_loss=0.01555, over 24750.00 frames. ], tot_loss[loss=0.01471, audio_tagging_loss=0.01471, over 4908610.90 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:16:07,288 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.744e+01 2.862e+01 3.032e+01 3.515e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 05:16:08,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=419520.0, ans=0.125 2023-12-22 05:16:15,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=419520.0, ans=0.0 2023-12-22 05:16:16,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=419520.0, ans=0.04949747468305833 2023-12-22 05:16:16,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419520.0, ans=0.125 2023-12-22 05:16:17,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=419586.6666666667, ans=0.0 2023-12-22 05:16:23,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-12-22 05:16:31,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=419653.3333333333, ans=0.2 2023-12-22 05:16:34,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=419653.3333333333, ans=0.125 2023-12-22 05:16:37,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=419720.0, ans=0.05 2023-12-22 05:16:38,377 INFO [train.py:886] (3/4) Epoch 14, batch 1000, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4917458.36 frames. ], batch size: 99, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:16:40,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=419720.0, ans=0.0 2023-12-22 05:16:40,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2023-12-22 05:16:48,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419786.6666666667, ans=0.1 2023-12-22 05:16:51,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-12-22 05:16:53,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=419786.6666666667, ans=0.0 2023-12-22 05:16:55,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=419786.6666666667, ans=0.125 2023-12-22 05:16:57,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=419786.6666666667, ans=0.2 2023-12-22 05:16:59,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=419853.3333333333, ans=0.035 2023-12-22 05:17:16,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419920.0, ans=0.1 2023-12-22 05:17:26,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=419986.6666666667, ans=0.05 2023-12-22 05:17:28,424 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:17:30,177 INFO [train.py:886] (3/4) Epoch 14, batch 1050, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4921904.44 frames. ], batch size: 100, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:17:32,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=420053.3333333333, ans=0.2 2023-12-22 05:17:46,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=420120.0, ans=0.0 2023-12-22 05:17:47,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=420120.0, ans=0.125 2023-12-22 05:17:51,092 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.700e+01 2.859e+01 3.033e+01 3.573e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 05:18:00,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=420253.3333333333, ans=0.05 2023-12-22 05:18:10,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=30.27 vs. limit=22.5 2023-12-22 05:18:21,449 INFO [train.py:886] (3/4) Epoch 14, batch 1100, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4929990.78 frames. ], batch size: 100, lr: 7.89e-03, grad_scale: 64.0 2023-12-22 05:18:32,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420453.3333333333, ans=0.1 2023-12-22 05:18:41,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=420520.0, ans=0.125 2023-12-22 05:18:50,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=420520.0, ans=0.0 2023-12-22 05:18:56,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2023-12-22 05:19:06,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-12-22 05:19:13,596 INFO [train.py:886] (3/4) Epoch 14, batch 1150, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24103.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4934639.90 frames. ], batch size: 100, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:19:18,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=420720.0, ans=0.125 2023-12-22 05:19:28,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=420786.6666666667, ans=0.125 2023-12-22 05:19:34,585 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.729e+01 2.852e+01 2.990e+01 3.450e+01, threshold=5.704e+01, percent-clipped=0.0 2023-12-22 05:20:04,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2023-12-22 05:20:05,393 INFO [train.py:886] (3/4) Epoch 14, batch 1200, loss[loss=0.01309, audio_tagging_loss=0.01309, over 21841.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4936907.47 frames. ], batch size: 107, lr: 7.88e-03, grad_scale: 64.0 2023-12-22 05:20:09,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421053.3333333333, ans=0.125 2023-12-22 05:20:12,961 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.611e-03 2023-12-22 05:20:13,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=421053.3333333333, ans=0.05 2023-12-22 05:20:17,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=421120.0, ans=0.0 2023-12-22 05:20:20,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=421120.0, ans=0.125 2023-12-22 05:20:33,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=421186.6666666667, ans=0.2 2023-12-22 05:20:39,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.78 vs. limit=22.5 2023-12-22 05:20:45,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=421253.3333333333, ans=0.05 2023-12-22 05:20:51,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-12-22 05:20:57,215 INFO [train.py:886] (3/4) Epoch 14, batch 1250, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4938169.38 frames. ], batch size: 99, lr: 7.88e-03, grad_scale: 128.0 2023-12-22 05:21:16,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=421453.3333333333, ans=0.125 2023-12-22 05:21:20,040 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.730e+01 2.921e+01 3.084e+01 3.740e+01, threshold=5.843e+01, percent-clipped=0.0 2023-12-22 05:21:23,093 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:21:38,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=421653.3333333333, ans=0.0 2023-12-22 05:21:50,377 INFO [train.py:886] (3/4) Epoch 14, batch 1300, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4943326.50 frames. ], batch size: 99, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:22:10,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-12-22 05:22:13,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=421853.3333333333, ans=0.2 2023-12-22 05:22:32,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=421986.6666666667, ans=0.0 2023-12-22 05:22:36,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=421986.6666666667, ans=0.125 2023-12-22 05:22:40,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=422053.3333333333, ans=0.0 2023-12-22 05:22:41,356 INFO [train.py:886] (3/4) Epoch 14, batch 1350, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4941042.51 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:22:44,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-22 05:22:50,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=422053.3333333333, ans=0.2 2023-12-22 05:22:58,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=422120.0, ans=0.125 2023-12-22 05:22:58,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=422120.0, ans=0.125 2023-12-22 05:23:03,257 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.742e+01 2.848e+01 3.000e+01 3.738e+01, threshold=5.695e+01, percent-clipped=0.0 2023-12-22 05:23:09,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=422186.6666666667, ans=0.125 2023-12-22 05:23:10,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=422186.6666666667, ans=0.125 2023-12-22 05:23:14,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=422253.3333333333, ans=0.125 2023-12-22 05:23:33,457 INFO [train.py:886] (3/4) Epoch 14, batch 1400, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4949104.08 frames. ], batch size: 100, lr: 7.87e-03, grad_scale: 64.0 2023-12-22 05:23:35,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=422386.6666666667, ans=0.0 2023-12-22 05:23:46,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=422453.3333333333, ans=0.0 2023-12-22 05:23:55,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=422520.0, ans=0.125 2023-12-22 05:23:57,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-12-22 05:24:03,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=422520.0, ans=0.125 2023-12-22 05:24:09,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2023-12-22 05:24:13,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=422586.6666666667, ans=0.025 2023-12-22 05:24:25,646 INFO [train.py:886] (3/4) Epoch 14, batch 1450, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4952804.20 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:24:39,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=422786.6666666667, ans=0.0 2023-12-22 05:24:43,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2023-12-22 05:24:47,335 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+01 2.660e+01 2.819e+01 2.961e+01 3.390e+01, threshold=5.637e+01, percent-clipped=0.0 2023-12-22 05:24:56,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=422920.0, ans=0.125 2023-12-22 05:24:59,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=422920.0, ans=0.0 2023-12-22 05:25:13,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422986.6666666667, ans=0.1 2023-12-22 05:25:16,640 INFO [train.py:886] (3/4) Epoch 14, batch 1500, loss[loss=0.01585, audio_tagging_loss=0.01585, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4944612.29 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:25:21,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=423053.3333333333, ans=0.2 2023-12-22 05:25:22,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=423053.3333333333, ans=0.125 2023-12-22 05:25:25,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-12-22 05:25:52,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423253.3333333333, ans=0.125 2023-12-22 05:25:54,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.80 vs. limit=22.5 2023-12-22 05:26:09,763 INFO [train.py:886] (3/4) Epoch 14, batch 1550, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4946226.03 frames. ], batch size: 100, lr: 7.86e-03, grad_scale: 64.0 2023-12-22 05:26:10,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=423386.6666666667, ans=0.0 2023-12-22 05:26:21,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-12-22 05:26:23,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=423453.3333333333, ans=0.125 2023-12-22 05:26:24,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=423453.3333333333, ans=0.0 2023-12-22 05:26:29,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423520.0, ans=0.125 2023-12-22 05:26:30,348 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.731e+01 2.887e+01 3.049e+01 4.641e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 05:26:31,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=423520.0, ans=0.0 2023-12-22 05:26:42,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=423586.6666666667, ans=0.1 2023-12-22 05:26:44,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=423586.6666666667, ans=0.125 2023-12-22 05:26:48,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=423586.6666666667, ans=0.0 2023-12-22 05:26:57,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=423653.3333333333, ans=0.125 2023-12-22 05:27:00,505 INFO [train.py:886] (3/4) Epoch 14, batch 1600, loss[loss=0.01778, audio_tagging_loss=0.01778, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4940922.37 frames. ], batch size: 99, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:27:08,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=423720.0, ans=0.125 2023-12-22 05:27:19,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-12-22 05:27:52,877 INFO [train.py:886] (3/4) Epoch 14, batch 1650, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4944912.17 frames. ], batch size: 99, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:28:06,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=424120.0, ans=0.125 2023-12-22 05:28:06,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=424120.0, ans=0.1 2023-12-22 05:28:14,931 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.709e+01 2.898e+01 3.040e+01 3.602e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 05:28:27,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=424253.3333333333, ans=0.125 2023-12-22 05:28:33,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=424320.0, ans=0.2 2023-12-22 05:28:44,644 INFO [train.py:886] (3/4) Epoch 14, batch 1700, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4950829.48 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:29:03,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=424453.3333333333, ans=0.125 2023-12-22 05:29:21,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=424586.6666666667, ans=0.0 2023-12-22 05:29:21,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=424586.6666666667, ans=0.125 2023-12-22 05:29:36,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=424720.0, ans=0.2 2023-12-22 05:29:36,996 INFO [train.py:886] (3/4) Epoch 14, batch 1750, loss[loss=0.01611, audio_tagging_loss=0.01611, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4954169.22 frames. ], batch size: 100, lr: 7.85e-03, grad_scale: 64.0 2023-12-22 05:29:59,038 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 2.657e+01 2.813e+01 2.952e+01 3.749e+01, threshold=5.627e+01, percent-clipped=0.0 2023-12-22 05:30:00,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=424853.3333333333, ans=0.0 2023-12-22 05:30:15,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 05:30:28,731 INFO [train.py:886] (3/4) Epoch 14, batch 1800, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4960597.24 frames. ], batch size: 100, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:30:35,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2023-12-22 05:30:37,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=425053.3333333333, ans=0.5 2023-12-22 05:30:45,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-12-22 05:30:51,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=425186.6666666667, ans=0.0 2023-12-22 05:30:52,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425186.6666666667, ans=0.1 2023-12-22 05:31:20,774 INFO [train.py:886] (3/4) Epoch 14, batch 1850, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4964884.18 frames. ], batch size: 99, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:31:34,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425453.3333333333, ans=0.1 2023-12-22 05:31:37,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=425453.3333333333, ans=0.125 2023-12-22 05:31:37,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2023-12-22 05:31:38,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=12.0 2023-12-22 05:31:42,477 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.703e+01 2.874e+01 3.050e+01 3.710e+01, threshold=5.749e+01, percent-clipped=0.0 2023-12-22 05:32:12,708 INFO [train.py:886] (3/4) Epoch 14, batch 1900, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4961505.90 frames. ], batch size: 99, lr: 7.84e-03, grad_scale: 64.0 2023-12-22 05:32:16,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=425720.0, ans=0.0 2023-12-22 05:32:27,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=425786.6666666667, ans=0.125 2023-12-22 05:32:35,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-12-22 05:32:38,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=425853.3333333333, ans=0.0 2023-12-22 05:33:04,802 INFO [train.py:886] (3/4) Epoch 14, batch 1950, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4951676.86 frames. ], batch size: 99, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:33:06,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=426053.3333333333, ans=0.125 2023-12-22 05:33:26,063 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.762e+01 2.921e+01 3.064e+01 3.650e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 05:33:49,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=426320.0, ans=0.0 2023-12-22 05:33:56,128 INFO [train.py:886] (3/4) Epoch 14, batch 2000, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4949530.46 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:34:03,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2023-12-22 05:34:14,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=426453.3333333333, ans=0.125 2023-12-22 05:34:18,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-12-22 05:34:19,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=426520.0, ans=0.0 2023-12-22 05:34:28,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=426586.6666666667, ans=0.2 2023-12-22 05:34:37,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426653.3333333333, ans=0.1 2023-12-22 05:34:44,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=426653.3333333333, ans=0.2 2023-12-22 05:34:49,232 INFO [train.py:886] (3/4) Epoch 14, batch 2050, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4950316.11 frames. ], batch size: 100, lr: 7.83e-03, grad_scale: 64.0 2023-12-22 05:34:57,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=426720.0, ans=0.0 2023-12-22 05:35:04,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=426786.6666666667, ans=0.2 2023-12-22 05:35:08,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=426786.6666666667, ans=0.2 2023-12-22 05:35:11,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.714e+01 2.866e+01 3.005e+01 3.462e+01, threshold=5.732e+01, percent-clipped=0.0 2023-12-22 05:35:14,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.11 vs. limit=15.0 2023-12-22 05:35:18,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=426853.3333333333, ans=0.0 2023-12-22 05:35:41,256 INFO [train.py:886] (3/4) Epoch 14, batch 2100, loss[loss=0.0165, audio_tagging_loss=0.0165, over 25000.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4953274.24 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:35:57,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=427120.0, ans=0.0 2023-12-22 05:35:58,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-22 05:36:22,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427320.0, ans=0.1 2023-12-22 05:36:32,198 INFO [train.py:886] (3/4) Epoch 14, batch 2150, loss[loss=0.0156, audio_tagging_loss=0.0156, over 24750.00 frames. ], tot_loss[loss=0.01451, audio_tagging_loss=0.01451, over 4957855.87 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:36:36,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-22 05:36:45,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=427453.3333333333, ans=0.0 2023-12-22 05:36:54,874 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.697e+01 2.884e+01 3.038e+01 3.535e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 05:37:25,200 INFO [train.py:886] (3/4) Epoch 14, batch 2200, loss[loss=0.01655, audio_tagging_loss=0.01655, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4949306.98 frames. ], batch size: 99, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:37:32,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427720.0, ans=0.1 2023-12-22 05:37:56,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=427920.0, ans=0.1 2023-12-22 05:38:17,235 INFO [train.py:886] (3/4) Epoch 14, batch 2250, loss[loss=0.01593, audio_tagging_loss=0.01593, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4942302.41 frames. ], batch size: 100, lr: 7.82e-03, grad_scale: 64.0 2023-12-22 05:38:37,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.726e+01 2.835e+01 3.020e+01 3.346e+01, threshold=5.670e+01, percent-clipped=0.0 2023-12-22 05:38:46,999 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:38:50,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=428253.3333333333, ans=0.0 2023-12-22 05:38:51,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=12.0 2023-12-22 05:38:54,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428253.3333333333, ans=0.1 2023-12-22 05:39:07,331 INFO [train.py:886] (3/4) Epoch 14, batch 2300, loss[loss=0.01626, audio_tagging_loss=0.01626, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4942772.59 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:39:07,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=15.0 2023-12-22 05:39:08,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=428386.6666666667, ans=0.125 2023-12-22 05:39:25,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=428453.3333333333, ans=0.125 2023-12-22 05:39:34,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=428520.0, ans=0.0 2023-12-22 05:39:35,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=428520.0, ans=0.125 2023-12-22 05:39:55,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=428653.3333333333, ans=0.125 2023-12-22 05:39:58,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-12-22 05:40:00,304 INFO [train.py:886] (3/4) Epoch 14, batch 2350, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4944919.26 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:40:06,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=428720.0, ans=0.125 2023-12-22 05:40:07,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 05:40:10,704 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:40:21,521 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+01 2.672e+01 2.834e+01 2.975e+01 3.666e+01, threshold=5.667e+01, percent-clipped=0.0 2023-12-22 05:40:22,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=428853.3333333333, ans=0.0 2023-12-22 05:40:36,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=428920.0, ans=0.125 2023-12-22 05:40:43,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=428986.6666666667, ans=0.125 2023-12-22 05:40:45,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=428986.6666666667, ans=0.0 2023-12-22 05:40:51,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-12-22 05:40:51,851 INFO [train.py:886] (3/4) Epoch 14, batch 2400, loss[loss=0.01694, audio_tagging_loss=0.01694, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 4952378.48 frames. ], batch size: 100, lr: 7.81e-03, grad_scale: 64.0 2023-12-22 05:41:03,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=429120.0, ans=0.0 2023-12-22 05:41:15,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=429186.6666666667, ans=0.2 2023-12-22 05:41:44,335 INFO [train.py:886] (3/4) Epoch 14, batch 2450, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24050.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4954515.65 frames. ], batch size: 100, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:41:48,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=429386.6666666667, ans=0.0 2023-12-22 05:42:05,684 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.732e+01 2.868e+01 2.998e+01 3.609e+01, threshold=5.736e+01, percent-clipped=0.0 2023-12-22 05:42:08,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-22 05:42:08,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-12-22 05:42:10,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-22 05:42:28,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=429653.3333333333, ans=0.125 2023-12-22 05:42:35,891 INFO [train.py:886] (3/4) Epoch 14, batch 2500, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4953231.02 frames. ], batch size: 99, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:42:50,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=429786.6666666667, ans=0.015 2023-12-22 05:42:54,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=429786.6666666667, ans=0.0 2023-12-22 05:42:56,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=429853.3333333333, ans=22.5 2023-12-22 05:43:03,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=429853.3333333333, ans=10.0 2023-12-22 05:43:17,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=429986.6666666667, ans=0.05 2023-12-22 05:43:27,552 INFO [train.py:886] (3/4) Epoch 14, batch 2550, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4944503.82 frames. ], batch size: 99, lr: 7.80e-03, grad_scale: 64.0 2023-12-22 05:43:48,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=430186.6666666667, ans=0.125 2023-12-22 05:43:49,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.03 vs. limit=22.5 2023-12-22 05:43:50,210 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.742e+01 2.892e+01 3.076e+01 3.372e+01, threshold=5.784e+01, percent-clipped=0.0 2023-12-22 05:43:50,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430186.6666666667, ans=0.1 2023-12-22 05:43:58,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=430253.3333333333, ans=0.125 2023-12-22 05:44:10,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-12-22 05:44:11,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=430320.0, ans=0.125 2023-12-22 05:44:12,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=430320.0, ans=0.125 2023-12-22 05:44:20,777 INFO [train.py:886] (3/4) Epoch 14, batch 2600, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24107.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4940546.28 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:44:24,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-22 05:45:02,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=430653.3333333333, ans=0.125 2023-12-22 05:45:05,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2023-12-22 05:45:11,163 INFO [train.py:886] (3/4) Epoch 14, batch 2650, loss[loss=0.01659, audio_tagging_loss=0.01659, over 25000.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4944384.22 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:45:26,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2023-12-22 05:45:33,100 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.737e+01 2.885e+01 3.025e+01 3.317e+01, threshold=5.770e+01, percent-clipped=0.0 2023-12-22 05:45:33,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430853.3333333333, ans=0.1 2023-12-22 05:45:34,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=430853.3333333333, ans=0.125 2023-12-22 05:45:35,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=430853.3333333333, ans=0.02 2023-12-22 05:46:00,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-12-22 05:46:03,604 INFO [train.py:886] (3/4) Epoch 14, batch 2700, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4945062.96 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:46:07,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=431053.3333333333, ans=0.125 2023-12-22 05:46:12,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=431053.3333333333, ans=0.125 2023-12-22 05:46:23,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431186.6666666667, ans=0.1 2023-12-22 05:46:25,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=431186.6666666667, ans=0.95 2023-12-22 05:46:34,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=431253.3333333333, ans=0.2 2023-12-22 05:46:38,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-12-22 05:46:41,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431253.3333333333, ans=0.1 2023-12-22 05:46:55,321 INFO [train.py:886] (3/4) Epoch 14, batch 2750, loss[loss=0.01638, audio_tagging_loss=0.01638, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4952892.60 frames. ], batch size: 100, lr: 7.79e-03, grad_scale: 64.0 2023-12-22 05:46:55,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=431386.6666666667, ans=0.0 2023-12-22 05:46:56,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=431386.6666666667, ans=0.0 2023-12-22 05:47:17,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+01 2.705e+01 2.817e+01 2.978e+01 3.595e+01, threshold=5.635e+01, percent-clipped=0.0 2023-12-22 05:47:26,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=431586.6666666667, ans=0.2 2023-12-22 05:47:27,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=431586.6666666667, ans=0.125 2023-12-22 05:47:29,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=431586.6666666667, ans=0.125 2023-12-22 05:47:30,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-22 05:47:35,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=431653.3333333333, ans=0.125 2023-12-22 05:47:46,631 INFO [train.py:886] (3/4) Epoch 14, batch 2800, loss[loss=0.01353, audio_tagging_loss=0.01353, over 24750.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4955594.65 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:47:50,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=431720.0, ans=0.125 2023-12-22 05:47:51,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431720.0, ans=0.125 2023-12-22 05:48:13,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=12.0 2023-12-22 05:48:15,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-22 05:48:29,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=431986.6666666667, ans=0.125 2023-12-22 05:48:29,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=431986.6666666667, ans=0.0 2023-12-22 05:48:39,044 INFO [train.py:886] (3/4) Epoch 14, batch 2850, loss[loss=0.0146, audio_tagging_loss=0.0146, over 24750.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 4948994.70 frames. ], batch size: 99, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:48:41,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-12-22 05:48:43,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=432053.3333333333, ans=0.2 2023-12-22 05:49:00,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.725e+01 2.911e+01 3.000e+01 3.725e+01, threshold=5.822e+01, percent-clipped=0.0 2023-12-22 05:49:05,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-12-22 05:49:10,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=432253.3333333333, ans=0.2 2023-12-22 05:49:17,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-12-22 05:49:18,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=432253.3333333333, ans=15.0 2023-12-22 05:49:31,044 INFO [train.py:886] (3/4) Epoch 14, batch 2900, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4948159.55 frames. ], batch size: 100, lr: 7.78e-03, grad_scale: 64.0 2023-12-22 05:49:47,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=432453.3333333333, ans=0.125 2023-12-22 05:49:48,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=432453.3333333333, ans=0.125 2023-12-22 05:50:07,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-12-22 05:50:08,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=432586.6666666667, ans=0.0 2023-12-22 05:50:15,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=432653.3333333333, ans=0.2 2023-12-22 05:50:15,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=432653.3333333333, ans=0.05 2023-12-22 05:50:22,851 INFO [train.py:886] (3/4) Epoch 14, batch 2950, loss[loss=0.01711, audio_tagging_loss=0.01711, over 24935.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4953519.73 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:50:28,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=432720.0, ans=0.0 2023-12-22 05:50:44,826 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.760e+01 2.879e+01 3.048e+01 3.785e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 05:50:56,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=432920.0, ans=0.95 2023-12-22 05:50:56,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=432920.0, ans=0.0 2023-12-22 05:51:03,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=432986.6666666667, ans=0.125 2023-12-22 05:51:06,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2023-12-22 05:51:14,397 INFO [train.py:886] (3/4) Epoch 14, batch 3000, loss[loss=0.0127, audio_tagging_loss=0.0127, over 23994.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4953129.47 frames. ], batch size: 100, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:51:14,398 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 05:51:35,570 INFO [train.py:917] (3/4) Epoch 14, validation: loss=0.03344, audio_tagging_loss=0.03344, over 3737520.00 frames. 2023-12-22 05:51:35,570 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 05:51:51,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433120.0, ans=0.1 2023-12-22 05:51:56,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=433186.6666666667, ans=0.2 2023-12-22 05:51:59,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433186.6666666667, ans=0.0 2023-12-22 05:52:19,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433320.0, ans=0.1 2023-12-22 05:52:23,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=433320.0, ans=0.125 2023-12-22 05:52:26,664 INFO [train.py:886] (3/4) Epoch 14, batch 3050, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4950978.63 frames. ], batch size: 99, lr: 7.77e-03, grad_scale: 64.0 2023-12-22 05:52:37,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=433453.3333333333, ans=0.05 2023-12-22 05:52:37,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=433453.3333333333, ans=0.125 2023-12-22 05:52:43,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2023-12-22 05:52:49,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.748e+01 2.853e+01 3.040e+01 4.569e+01, threshold=5.706e+01, percent-clipped=0.0 2023-12-22 05:52:56,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=433520.0, ans=0.05 2023-12-22 05:53:19,727 INFO [train.py:886] (3/4) Epoch 14, batch 3100, loss[loss=0.01537, audio_tagging_loss=0.01537, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4953137.44 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:53:32,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=433786.6666666667, ans=0.125 2023-12-22 05:53:48,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=433920.0, ans=0.2 2023-12-22 05:53:57,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=433920.0, ans=0.125 2023-12-22 05:54:01,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-22 05:54:02,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=433986.6666666667, ans=0.125 2023-12-22 05:54:09,522 INFO [train.py:886] (3/4) Epoch 14, batch 3150, loss[loss=0.01561, audio_tagging_loss=0.01561, over 24750.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 4951059.90 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:54:12,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=434053.3333333333, ans=0.2 2023-12-22 05:54:19,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2023-12-22 05:54:21,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434120.0, ans=0.1 2023-12-22 05:54:21,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=434120.0, ans=0.2 2023-12-22 05:54:26,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=434120.0, ans=0.2 2023-12-22 05:54:29,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-12-22 05:54:30,684 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.796e+01 2.957e+01 3.108e+01 3.516e+01, threshold=5.914e+01, percent-clipped=0.0 2023-12-22 05:54:30,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=434186.6666666667, ans=0.0 2023-12-22 05:54:41,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-22 05:54:42,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=434253.3333333333, ans=0.0 2023-12-22 05:54:48,193 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:55:01,147 INFO [train.py:886] (3/4) Epoch 14, batch 3200, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4945805.04 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:55:05,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2023-12-22 05:55:07,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434386.6666666667, ans=0.1 2023-12-22 05:55:30,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-12-22 05:55:35,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.249e-01 2023-12-22 05:55:51,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.91 vs. limit=10.0 2023-12-22 05:55:53,582 INFO [train.py:886] (3/4) Epoch 14, batch 3250, loss[loss=0.01545, audio_tagging_loss=0.01545, over 24750.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4951929.10 frames. ], batch size: 99, lr: 7.76e-03, grad_scale: 64.0 2023-12-22 05:56:02,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434786.6666666667, ans=0.1 2023-12-22 05:56:03,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=434786.6666666667, ans=0.125 2023-12-22 05:56:11,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=434786.6666666667, ans=0.125 2023-12-22 05:56:14,170 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.695e+01 2.854e+01 3.040e+01 5.272e+01, threshold=5.707e+01, percent-clipped=0.0 2023-12-22 05:56:33,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=434920.0, ans=0.125 2023-12-22 05:56:39,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2023-12-22 05:56:40,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.78 vs. limit=22.5 2023-12-22 05:56:44,358 INFO [train.py:886] (3/4) Epoch 14, batch 3300, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4954040.08 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:57:03,391 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:57:03,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=435120.0, ans=0.125 2023-12-22 05:57:04,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=435120.0, ans=0.0 2023-12-22 05:57:09,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435186.6666666667, ans=0.1 2023-12-22 05:57:10,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=435186.6666666667, ans=0.2 2023-12-22 05:57:15,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.64 vs. limit=10.0 2023-12-22 05:57:17,708 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.604e-02 2023-12-22 05:57:18,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-12-22 05:57:23,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=435253.3333333333, ans=0.09899494936611666 2023-12-22 05:57:26,430 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:57:37,389 INFO [train.py:886] (3/4) Epoch 14, batch 3350, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4954365.41 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:57:38,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=435386.6666666667, ans=0.0 2023-12-22 05:57:49,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=435453.3333333333, ans=0.125 2023-12-22 05:57:59,779 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.712e+01 2.830e+01 3.003e+01 3.619e+01, threshold=5.660e+01, percent-clipped=0.0 2023-12-22 05:58:02,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=435520.0, ans=0.0 2023-12-22 05:58:03,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=435520.0, ans=0.0 2023-12-22 05:58:10,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 05:58:11,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-12-22 05:58:26,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=435653.3333333333, ans=0.125 2023-12-22 05:58:27,780 INFO [train.py:886] (3/4) Epoch 14, batch 3400, loss[loss=0.01651, audio_tagging_loss=0.01651, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4947267.49 frames. ], batch size: 100, lr: 7.75e-03, grad_scale: 64.0 2023-12-22 05:58:32,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=435720.0, ans=0.0 2023-12-22 05:58:45,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435786.6666666667, ans=0.125 2023-12-22 05:58:49,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435853.3333333333, ans=0.1 2023-12-22 05:58:51,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2023-12-22 05:59:05,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435920.0, ans=0.1 2023-12-22 05:59:05,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-12-22 05:59:12,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435986.6666666667, ans=0.0 2023-12-22 05:59:13,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=435986.6666666667, ans=0.125 2023-12-22 05:59:16,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435986.6666666667, ans=0.1 2023-12-22 05:59:20,165 INFO [train.py:886] (3/4) Epoch 14, batch 3450, loss[loss=0.01402, audio_tagging_loss=0.01402, over 21701.00 frames. ], tot_loss[loss=0.01461, audio_tagging_loss=0.01461, over 4940648.54 frames. ], batch size: 107, lr: 7.74e-03, grad_scale: 64.0 2023-12-22 05:59:30,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436120.0, ans=0.1 2023-12-22 05:59:43,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-12-22 05:59:43,842 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.742e+01 2.883e+01 3.018e+01 3.834e+01, threshold=5.765e+01, percent-clipped=0.0 2023-12-22 05:59:48,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=436186.6666666667, ans=0.125 2023-12-22 05:59:50,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=436253.3333333333, ans=0.0 2023-12-22 05:59:59,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-22 06:00:09,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436320.0, ans=0.125 2023-12-22 06:00:13,199 INFO [train.py:886] (3/4) Epoch 14, batch 3500, loss[loss=0.01483, audio_tagging_loss=0.01483, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4934401.83 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:00:21,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=436386.6666666667, ans=0.05 2023-12-22 06:00:23,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=436453.3333333333, ans=0.0 2023-12-22 06:00:24,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=15.0 2023-12-22 06:00:26,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=436453.3333333333, ans=0.0 2023-12-22 06:00:27,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=436453.3333333333, ans=0.125 2023-12-22 06:00:31,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-22 06:00:37,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436520.0, ans=0.125 2023-12-22 06:01:02,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=436720.0, ans=0.0 2023-12-22 06:01:02,846 INFO [train.py:886] (3/4) Epoch 14, batch 3550, loss[loss=0.01599, audio_tagging_loss=0.01599, over 24929.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4941882.47 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:01:26,843 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.689e+01 2.829e+01 3.028e+01 3.560e+01, threshold=5.658e+01, percent-clipped=0.0 2023-12-22 06:01:42,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=436920.0, ans=0.125 2023-12-22 06:01:42,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-12-22 06:01:53,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=437053.3333333333, ans=0.0 2023-12-22 06:01:54,625 INFO [train.py:886] (3/4) Epoch 14, batch 3600, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4943413.41 frames. ], batch size: 100, lr: 7.74e-03, grad_scale: 32.0 2023-12-22 06:01:59,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=437053.3333333333, ans=0.125 2023-12-22 06:02:00,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=437053.3333333333, ans=0.125 2023-12-22 06:02:02,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=437053.3333333333, ans=0.125 2023-12-22 06:02:14,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=437186.6666666667, ans=0.0 2023-12-22 06:02:27,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=437253.3333333333, ans=0.125 2023-12-22 06:02:45,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=437386.6666666667, ans=0.125 2023-12-22 06:02:46,115 INFO [train.py:886] (3/4) Epoch 14, batch 3650, loss[loss=0.01209, audio_tagging_loss=0.01209, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4949269.27 frames. ], batch size: 99, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:02:48,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437386.6666666667, ans=0.1 2023-12-22 06:02:55,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437386.6666666667, ans=0.1 2023-12-22 06:03:04,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=437453.3333333333, ans=0.05 2023-12-22 06:03:08,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=437520.0, ans=0.0 2023-12-22 06:03:09,462 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.669e+01 2.809e+01 2.947e+01 3.516e+01, threshold=5.618e+01, percent-clipped=0.0 2023-12-22 06:03:09,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=437520.0, ans=0.0 2023-12-22 06:03:17,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=12.0 2023-12-22 06:03:17,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=437586.6666666667, ans=0.125 2023-12-22 06:03:19,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=437586.6666666667, ans=0.125 2023-12-22 06:03:29,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=437653.3333333333, ans=0.2 2023-12-22 06:03:29,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-12-22 06:03:35,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-22 06:03:38,032 INFO [train.py:886] (3/4) Epoch 14, batch 3700, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4958278.94 frames. ], batch size: 100, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:03:59,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437853.3333333333, ans=0.1 2023-12-22 06:04:05,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437853.3333333333, ans=0.125 2023-12-22 06:04:06,736 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:04:14,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=437920.0, ans=0.0 2023-12-22 06:04:15,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=437920.0, ans=0.05 2023-12-22 06:04:17,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=437920.0, ans=0.0 2023-12-22 06:04:30,271 INFO [train.py:886] (3/4) Epoch 14, batch 3750, loss[loss=0.01496, audio_tagging_loss=0.01496, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4955600.09 frames. ], batch size: 99, lr: 7.73e-03, grad_scale: 32.0 2023-12-22 06:04:54,120 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 2.790e+01 2.895e+01 3.050e+01 3.553e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 06:04:55,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=438186.6666666667, ans=0.125 2023-12-22 06:04:56,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438186.6666666667, ans=0.1 2023-12-22 06:05:01,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=438253.3333333333, ans=0.0 2023-12-22 06:05:14,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=438320.0, ans=0.125 2023-12-22 06:05:22,213 INFO [train.py:886] (3/4) Epoch 14, batch 3800, loss[loss=0.02103, audio_tagging_loss=0.02103, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4951765.78 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:05:25,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=438386.6666666667, ans=0.0 2023-12-22 06:05:27,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=438386.6666666667, ans=0.125 2023-12-22 06:05:43,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438520.0, ans=0.1 2023-12-22 06:06:14,376 INFO [train.py:886] (3/4) Epoch 14, batch 3850, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4945142.32 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:06:29,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438786.6666666667, ans=0.125 2023-12-22 06:06:38,159 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.739e+01 2.908e+01 3.099e+01 3.536e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:06:41,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-22 06:07:00,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-22 06:07:04,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=438986.6666666667, ans=0.95 2023-12-22 06:07:06,019 INFO [train.py:886] (3/4) Epoch 14, batch 3900, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4948316.57 frames. ], batch size: 99, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:07:14,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=439053.3333333333, ans=0.0 2023-12-22 06:07:33,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439186.6666666667, ans=0.1 2023-12-22 06:07:36,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439253.3333333333, ans=0.1 2023-12-22 06:07:36,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2023-12-22 06:07:37,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439253.3333333333, ans=0.125 2023-12-22 06:07:39,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.72 vs. limit=10.0 2023-12-22 06:07:42,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=439253.3333333333, ans=0.125 2023-12-22 06:07:57,863 INFO [train.py:886] (3/4) Epoch 14, batch 3950, loss[loss=0.01619, audio_tagging_loss=0.01619, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4949960.75 frames. ], batch size: 100, lr: 7.72e-03, grad_scale: 32.0 2023-12-22 06:08:14,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=439453.3333333333, ans=0.125 2023-12-22 06:08:22,149 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 2.684e+01 2.825e+01 2.982e+01 3.429e+01, threshold=5.651e+01, percent-clipped=0.0 2023-12-22 06:08:22,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-12-22 06:08:30,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439586.6666666667, ans=0.125 2023-12-22 06:08:34,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=439586.6666666667, ans=0.125 2023-12-22 06:08:42,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=439653.3333333333, ans=0.125 2023-12-22 06:08:48,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=439653.3333333333, ans=0.125 2023-12-22 06:08:50,457 INFO [train.py:886] (3/4) Epoch 14, batch 4000, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4953859.21 frames. ], batch size: 100, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:08:54,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=439720.0, ans=0.125 2023-12-22 06:09:01,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=439786.6666666667, ans=0.0 2023-12-22 06:09:12,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=439853.3333333333, ans=0.125 2023-12-22 06:09:14,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=439853.3333333333, ans=0.0 2023-12-22 06:09:20,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=439853.3333333333, ans=0.125 2023-12-22 06:09:29,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2023-12-22 06:09:31,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=439986.6666666667, ans=0.125 2023-12-22 06:09:35,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=439986.6666666667, ans=0.0 2023-12-22 06:09:43,009 INFO [train.py:886] (3/4) Epoch 14, batch 4050, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4954124.39 frames. ], batch size: 99, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:09:43,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-12-22 06:09:47,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440053.3333333333, ans=0.1 2023-12-22 06:09:58,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2023-12-22 06:09:59,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=440120.0, ans=0.125 2023-12-22 06:10:06,321 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+01 2.790e+01 2.955e+01 3.071e+01 3.568e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 06:10:09,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=440186.6666666667, ans=0.125 2023-12-22 06:10:14,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=440253.3333333333, ans=0.125 2023-12-22 06:10:33,743 INFO [train.py:886] (3/4) Epoch 14, batch 4100, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01455, audio_tagging_loss=0.01455, over 4949549.61 frames. ], batch size: 100, lr: 7.71e-03, grad_scale: 32.0 2023-12-22 06:11:02,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2023-12-22 06:11:06,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=440586.6666666667, ans=0.2 2023-12-22 06:11:08,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2023-12-22 06:11:08,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2023-12-22 06:11:10,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=440586.6666666667, ans=0.0 2023-12-22 06:11:17,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=440653.3333333333, ans=0.125 2023-12-22 06:11:26,733 INFO [train.py:886] (3/4) Epoch 14, batch 4150, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4942742.83 frames. ], batch size: 99, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:11:26,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=440720.0, ans=0.125 2023-12-22 06:11:31,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=440720.0, ans=0.0 2023-12-22 06:11:50,565 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.748e+01 2.880e+01 2.984e+01 4.734e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 06:11:52,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=440853.3333333333, ans=0.125 2023-12-22 06:12:03,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=440920.0, ans=0.125 2023-12-22 06:12:06,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440986.6666666667, ans=0.1 2023-12-22 06:12:06,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=440986.6666666667, ans=0.95 2023-12-22 06:12:12,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=440986.6666666667, ans=0.05 2023-12-22 06:12:17,793 INFO [train.py:886] (3/4) Epoch 14, batch 4200, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01445, audio_tagging_loss=0.01445, over 4944240.54 frames. ], batch size: 100, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:12:27,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441120.0, ans=0.125 2023-12-22 06:12:45,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441186.6666666667, ans=0.1 2023-12-22 06:13:04,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=441320.0, ans=0.025 2023-12-22 06:13:07,453 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:13:09,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=441386.6666666667, ans=0.0 2023-12-22 06:13:09,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=441386.6666666667, ans=0.2 2023-12-22 06:13:10,160 INFO [train.py:886] (3/4) Epoch 14, batch 4250, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4940284.05 frames. ], batch size: 100, lr: 7.70e-03, grad_scale: 32.0 2023-12-22 06:13:34,375 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.735e+01 2.849e+01 2.986e+01 3.517e+01, threshold=5.698e+01, percent-clipped=0.0 2023-12-22 06:13:45,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=441586.6666666667, ans=0.0 2023-12-22 06:13:57,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=441653.3333333333, ans=0.09899494936611666 2023-12-22 06:13:57,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-12-22 06:14:02,683 INFO [train.py:886] (3/4) Epoch 14, batch 4300, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01441, audio_tagging_loss=0.01441, over 4944923.67 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:14:10,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441720.0, ans=0.1 2023-12-22 06:14:14,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-12-22 06:14:14,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=441786.6666666667, ans=0.0 2023-12-22 06:14:22,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=441853.3333333333, ans=0.125 2023-12-22 06:14:25,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-12-22 06:14:26,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=441853.3333333333, ans=0.125 2023-12-22 06:14:33,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=441920.0, ans=0.125 2023-12-22 06:14:48,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=441986.6666666667, ans=0.2 2023-12-22 06:14:53,345 INFO [train.py:886] (3/4) Epoch 14, batch 4350, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4949436.29 frames. ], batch size: 100, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:03,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442120.0, ans=0.1 2023-12-22 06:15:15,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=442186.6666666667, ans=0.0 2023-12-22 06:15:17,196 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+01 2.829e+01 2.968e+01 3.125e+01 3.795e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 06:15:20,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-12-22 06:15:27,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-12-22 06:15:39,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442320.0, ans=0.1 2023-12-22 06:15:44,755 INFO [train.py:886] (3/4) Epoch 14, batch 4400, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01452, audio_tagging_loss=0.01452, over 4943281.75 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:15:45,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=442386.6666666667, ans=0.2 2023-12-22 06:15:50,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2023-12-22 06:16:09,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=442520.0, ans=0.125 2023-12-22 06:16:12,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=442520.0, ans=0.2 2023-12-22 06:16:29,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=12.0 2023-12-22 06:16:35,472 INFO [train.py:886] (3/4) Epoch 14, batch 4450, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4944220.82 frames. ], batch size: 99, lr: 7.69e-03, grad_scale: 32.0 2023-12-22 06:16:41,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=442720.0, ans=0.0 2023-12-22 06:16:46,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-12-22 06:16:55,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=442853.3333333333, ans=0.125 2023-12-22 06:16:59,495 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.711e+01 2.908e+01 3.101e+01 3.699e+01, threshold=5.817e+01, percent-clipped=0.0 2023-12-22 06:17:00,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-12-22 06:17:01,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=442853.3333333333, ans=0.125 2023-12-22 06:17:07,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-12-22 06:17:19,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442986.6666666667, ans=0.1 2023-12-22 06:17:27,841 INFO [train.py:886] (3/4) Epoch 14, batch 4500, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4944095.01 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:17:42,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-12-22 06:18:03,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-12-22 06:18:17,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443320.0, ans=0.1 2023-12-22 06:18:20,090 INFO [train.py:886] (3/4) Epoch 14, batch 4550, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4947736.38 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:18:23,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=443386.6666666667, ans=0.125 2023-12-22 06:18:32,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=443453.3333333333, ans=0.0 2023-12-22 06:18:43,103 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.709e+01 2.871e+01 3.035e+01 3.525e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 06:18:44,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=443520.0, ans=0.0 2023-12-22 06:18:48,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-12-22 06:19:11,003 INFO [train.py:886] (3/4) Epoch 14, batch 4600, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4952304.07 frames. ], batch size: 100, lr: 7.68e-03, grad_scale: 32.0 2023-12-22 06:19:11,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2023-12-22 06:19:21,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=443786.6666666667, ans=0.125 2023-12-22 06:19:21,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=443786.6666666667, ans=0.125 2023-12-22 06:19:23,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443786.6666666667, ans=0.125 2023-12-22 06:19:35,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=443853.3333333333, ans=0.0 2023-12-22 06:20:03,993 INFO [train.py:886] (3/4) Epoch 14, batch 4650, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4954589.58 frames. ], batch size: 100, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:20:12,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444120.0, ans=0.1 2023-12-22 06:20:17,561 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:20:18,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=444120.0, ans=0.125 2023-12-22 06:20:22,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=444120.0, ans=0.0 2023-12-22 06:20:27,276 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.504e+01 2.730e+01 2.882e+01 3.030e+01 3.512e+01, threshold=5.764e+01, percent-clipped=0.0 2023-12-22 06:20:36,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2023-12-22 06:20:53,820 INFO [train.py:886] (3/4) Epoch 14, batch 4700, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4956767.76 frames. ], batch size: 99, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:20:59,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=444386.6666666667, ans=0.0 2023-12-22 06:21:01,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444386.6666666667, ans=0.1 2023-12-22 06:21:03,629 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.625e-03 2023-12-22 06:21:13,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=444520.0, ans=0.125 2023-12-22 06:21:16,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=444520.0, ans=0.2 2023-12-22 06:21:41,571 INFO [train.py:886] (3/4) Epoch 14, batch 4750, loss[loss=0.01518, audio_tagging_loss=0.01518, over 24750.00 frames. ], tot_loss[loss=0.01463, audio_tagging_loss=0.01463, over 4951849.60 frames. ], batch size: 99, lr: 7.67e-03, grad_scale: 32.0 2023-12-22 06:21:41,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444720.0, ans=0.1 2023-12-22 06:21:42,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=444720.0, ans=0.125 2023-12-22 06:21:48,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2023-12-22 06:22:18,551 INFO [train.py:886] (3/4) Epoch 15, batch 0, loss[loss=0.03464, audio_tagging_loss=0.03464, over 25000.00 frames. ], tot_loss[loss=0.03464, audio_tagging_loss=0.03464, over 25000.00 frames. ], batch size: 100, lr: 7.41e-03, grad_scale: 32.0 2023-12-22 06:22:18,552 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 06:22:39,992 INFO [train.py:917] (3/4) Epoch 15, validation: loss=0.03275, audio_tagging_loss=0.03275, over 3737520.00 frames. 2023-12-22 06:22:39,993 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 06:22:40,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=444826.6666666667, ans=0.125 2023-12-22 06:22:47,358 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.805e+01 2.969e+01 3.103e+01 9.102e+01, threshold=5.939e+01, percent-clipped=6.0 2023-12-22 06:22:47,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444826.6666666667, ans=0.125 2023-12-22 06:22:48,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=444826.6666666667, ans=0.125 2023-12-22 06:22:59,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=444960.0, ans=0.125 2023-12-22 06:23:03,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444960.0, ans=0.1 2023-12-22 06:23:12,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-22 06:23:16,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=445026.6666666667, ans=0.1 2023-12-22 06:23:31,620 INFO [train.py:886] (3/4) Epoch 15, batch 50, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.02259, audio_tagging_loss=0.02259, over 1123171.16 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:23:31,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=445160.0, ans=0.125 2023-12-22 06:23:35,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=445160.0, ans=0.0 2023-12-22 06:23:36,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2023-12-22 06:23:39,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=445160.0, ans=0.2 2023-12-22 06:24:11,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=445426.6666666667, ans=0.125 2023-12-22 06:24:23,179 INFO [train.py:886] (3/4) Epoch 15, batch 100, loss[loss=0.01443, audio_tagging_loss=0.01443, over 22482.00 frames. ], tot_loss[loss=0.01943, audio_tagging_loss=0.01943, over 1970581.95 frames. ], batch size: 107, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:24:23,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2023-12-22 06:24:30,485 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 3.116e+01 3.356e+01 3.817e+01 5.461e+01, threshold=6.711e+01, percent-clipped=0.0 2023-12-22 06:24:31,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=445493.3333333333, ans=0.125 2023-12-22 06:24:39,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=445560.0, ans=0.125 2023-12-22 06:25:14,545 INFO [train.py:886] (3/4) Epoch 15, batch 150, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.01774, audio_tagging_loss=0.01774, over 2632319.05 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:25:22,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=445826.6666666667, ans=0.125 2023-12-22 06:25:29,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445893.3333333333, ans=0.1 2023-12-22 06:25:46,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-12-22 06:25:51,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=446026.6666666667, ans=0.0 2023-12-22 06:26:06,064 INFO [train.py:886] (3/4) Epoch 15, batch 200, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01674, audio_tagging_loss=0.01674, over 3147055.93 frames. ], batch size: 100, lr: 7.40e-03, grad_scale: 32.0 2023-12-22 06:26:13,374 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+01 2.814e+01 2.983e+01 3.104e+01 3.592e+01, threshold=5.965e+01, percent-clipped=0.0 2023-12-22 06:26:13,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2023-12-22 06:26:17,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=446226.6666666667, ans=0.125 2023-12-22 06:26:32,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=446293.3333333333, ans=0.2 2023-12-22 06:26:32,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=446293.3333333333, ans=0.95 2023-12-22 06:26:35,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2023-12-22 06:26:56,925 INFO [train.py:886] (3/4) Epoch 15, batch 250, loss[loss=0.01482, audio_tagging_loss=0.01482, over 25000.00 frames. ], tot_loss[loss=0.0161, audio_tagging_loss=0.0161, over 3553821.80 frames. ], batch size: 100, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:04,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=446493.3333333333, ans=0.125 2023-12-22 06:27:33,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=446693.3333333333, ans=0.125 2023-12-22 06:27:50,302 INFO [train.py:886] (3/4) Epoch 15, batch 300, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01574, audio_tagging_loss=0.01574, over 3865127.65 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:27:57,054 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+01 2.728e+01 2.846e+01 3.000e+01 3.484e+01, threshold=5.691e+01, percent-clipped=0.0 2023-12-22 06:27:58,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=446826.6666666667, ans=0.125 2023-12-22 06:28:04,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=446893.3333333333, ans=10.0 2023-12-22 06:28:05,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=446893.3333333333, ans=0.125 2023-12-22 06:28:12,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=446960.0, ans=0.125 2023-12-22 06:28:42,182 INFO [train.py:886] (3/4) Epoch 15, batch 350, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01533, audio_tagging_loss=0.01533, over 4102176.08 frames. ], batch size: 99, lr: 7.39e-03, grad_scale: 32.0 2023-12-22 06:28:58,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=447226.6666666667, ans=0.125 2023-12-22 06:29:16,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=447360.0, ans=0.125 2023-12-22 06:29:20,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447360.0, ans=0.1 2023-12-22 06:29:22,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2023-12-22 06:29:23,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=447426.6666666667, ans=0.0 2023-12-22 06:29:23,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=447426.6666666667, ans=0.125 2023-12-22 06:29:26,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=447426.6666666667, ans=0.0 2023-12-22 06:29:34,122 INFO [train.py:886] (3/4) Epoch 15, batch 400, loss[loss=0.01619, audio_tagging_loss=0.01619, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 4293724.94 frames. ], batch size: 99, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:29:41,447 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.769e+01 2.874e+01 3.024e+01 3.342e+01, threshold=5.748e+01, percent-clipped=0.0 2023-12-22 06:29:42,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2023-12-22 06:29:44,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-12-22 06:29:47,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2023-12-22 06:29:52,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-12-22 06:29:52,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-12-22 06:30:13,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-12-22 06:30:15,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=447760.0, ans=0.0 2023-12-22 06:30:18,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 06:30:24,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=447760.0, ans=0.125 2023-12-22 06:30:26,474 INFO [train.py:886] (3/4) Epoch 15, batch 450, loss[loss=0.01383, audio_tagging_loss=0.01383, over 23983.00 frames. ], tot_loss[loss=0.01484, audio_tagging_loss=0.01484, over 4436340.38 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:30:45,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=447960.0, ans=0.0 2023-12-22 06:30:51,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=447960.0, ans=0.0 2023-12-22 06:31:09,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=448093.3333333333, ans=0.2 2023-12-22 06:31:18,202 INFO [train.py:886] (3/4) Epoch 15, batch 500, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 4553533.82 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:31:25,365 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.715e+01 2.862e+01 2.998e+01 3.574e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:31:27,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=448226.6666666667, ans=0.125 2023-12-22 06:31:56,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448360.0, ans=0.1 2023-12-22 06:32:03,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448426.6666666667, ans=0.1 2023-12-22 06:32:10,773 INFO [train.py:886] (3/4) Epoch 15, batch 550, loss[loss=0.0161, audio_tagging_loss=0.0161, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4648038.70 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 32.0 2023-12-22 06:32:11,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=448493.3333333333, ans=0.125 2023-12-22 06:32:16,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=448493.3333333333, ans=0.04949747468305833 2023-12-22 06:32:17,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-12-22 06:32:23,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=448560.0, ans=0.125 2023-12-22 06:32:30,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=448626.6666666667, ans=0.0 2023-12-22 06:32:30,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-12-22 06:32:31,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2023-12-22 06:32:59,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448760.0, ans=0.125 2023-12-22 06:33:02,503 INFO [train.py:886] (3/4) Epoch 15, batch 600, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 4715084.74 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:33:09,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.782e+01 2.890e+01 3.094e+01 3.722e+01, threshold=5.781e+01, percent-clipped=0.0 2023-12-22 06:33:25,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=448960.0, ans=0.1 2023-12-22 06:33:36,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=449026.6666666667, ans=0.0 2023-12-22 06:33:41,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=449026.6666666667, ans=0.125 2023-12-22 06:33:43,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=449093.3333333333, ans=15.0 2023-12-22 06:33:54,130 INFO [train.py:886] (3/4) Epoch 15, batch 650, loss[loss=0.0157, audio_tagging_loss=0.0157, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4762989.33 frames. ], batch size: 99, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:33:58,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-12-22 06:34:05,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-12-22 06:34:06,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=449226.6666666667, ans=0.125 2023-12-22 06:34:09,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-22 06:34:12,588 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:34:15,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449293.3333333333, ans=0.1 2023-12-22 06:34:20,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=449293.3333333333, ans=0.125 2023-12-22 06:34:25,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=449360.0, ans=0.125 2023-12-22 06:34:29,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=449360.0, ans=0.0 2023-12-22 06:34:31,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-12-22 06:34:35,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-12-22 06:34:38,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449426.6666666667, ans=0.1 2023-12-22 06:34:46,662 INFO [train.py:886] (3/4) Epoch 15, batch 700, loss[loss=0.01609, audio_tagging_loss=0.01609, over 22372.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4795490.07 frames. ], batch size: 107, lr: 7.37e-03, grad_scale: 32.0 2023-12-22 06:34:53,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.765e+01 2.933e+01 3.097e+01 3.905e+01, threshold=5.867e+01, percent-clipped=0.0 2023-12-22 06:34:55,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=449493.3333333333, ans=0.0 2023-12-22 06:35:02,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=15.0 2023-12-22 06:35:06,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-22 06:35:18,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449693.3333333333, ans=0.1 2023-12-22 06:35:25,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=449693.3333333333, ans=0.125 2023-12-22 06:35:28,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=449760.0, ans=10.0 2023-12-22 06:35:30,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=449760.0, ans=0.2 2023-12-22 06:35:38,025 INFO [train.py:886] (3/4) Epoch 15, batch 750, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4830620.82 frames. ], batch size: 100, lr: 7.37e-03, grad_scale: 64.0 2023-12-22 06:35:39,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.28 vs. limit=15.0 2023-12-22 06:35:59,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=449960.0, ans=0.0 2023-12-22 06:36:08,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450026.6666666667, ans=0.1 2023-12-22 06:36:15,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=450026.6666666667, ans=0.125 2023-12-22 06:36:19,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450093.3333333333, ans=0.0 2023-12-22 06:36:29,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=450160.0, ans=15.0 2023-12-22 06:36:29,744 INFO [train.py:886] (3/4) Epoch 15, batch 800, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4858655.75 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:36:37,134 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 2.693e+01 2.864e+01 3.009e+01 3.417e+01, threshold=5.729e+01, percent-clipped=0.0 2023-12-22 06:36:48,176 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 06:36:55,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=450293.3333333333, ans=0.0 2023-12-22 06:37:00,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450360.0, ans=0.125 2023-12-22 06:37:17,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=450426.6666666667, ans=0.125 2023-12-22 06:37:22,178 INFO [train.py:886] (3/4) Epoch 15, batch 850, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4882386.68 frames. ], batch size: 100, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:37:25,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=450493.3333333333, ans=0.0 2023-12-22 06:37:58,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=450693.3333333333, ans=0.0 2023-12-22 06:38:03,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=450760.0, ans=0.125 2023-12-22 06:38:14,333 INFO [train.py:886] (3/4) Epoch 15, batch 900, loss[loss=0.01418, audio_tagging_loss=0.01418, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4898831.41 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:38:21,711 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.789e+01 2.921e+01 3.060e+01 3.433e+01, threshold=5.842e+01, percent-clipped=0.0 2023-12-22 06:38:59,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2023-12-22 06:39:06,217 INFO [train.py:886] (3/4) Epoch 15, batch 950, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4906797.61 frames. ], batch size: 99, lr: 7.36e-03, grad_scale: 64.0 2023-12-22 06:39:28,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-22 06:39:34,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2023-12-22 06:39:39,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451360.0, ans=0.1 2023-12-22 06:39:48,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=451426.6666666667, ans=0.0 2023-12-22 06:39:51,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=451426.6666666667, ans=0.2 2023-12-22 06:39:55,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=451426.6666666667, ans=0.125 2023-12-22 06:39:58,842 INFO [train.py:886] (3/4) Epoch 15, batch 1000, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4910300.97 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:40:06,053 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.747e+01 2.873e+01 3.000e+01 3.786e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 06:40:08,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451560.0, ans=0.125 2023-12-22 06:40:13,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-12-22 06:40:49,167 INFO [train.py:886] (3/4) Epoch 15, batch 1050, loss[loss=0.01449, audio_tagging_loss=0.01449, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4918614.61 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:40:58,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451826.6666666667, ans=0.1 2023-12-22 06:41:01,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=451893.3333333333, ans=0.05 2023-12-22 06:41:06,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-12-22 06:41:06,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-22 06:41:13,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-22 06:41:42,793 INFO [train.py:886] (3/4) Epoch 15, batch 1100, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4931240.33 frames. ], batch size: 100, lr: 7.35e-03, grad_scale: 64.0 2023-12-22 06:41:45,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452160.0, ans=0.1 2023-12-22 06:41:48,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=452160.0, ans=22.5 2023-12-22 06:41:49,461 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.730e+01 2.835e+01 3.021e+01 3.607e+01, threshold=5.671e+01, percent-clipped=0.0 2023-12-22 06:41:49,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=452160.0, ans=0.0 2023-12-22 06:42:09,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=452293.3333333333, ans=0.0 2023-12-22 06:42:11,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=452293.3333333333, ans=0.125 2023-12-22 06:42:23,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=452426.6666666667, ans=0.0 2023-12-22 06:42:27,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452426.6666666667, ans=0.1 2023-12-22 06:42:28,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.01 vs. limit=15.0 2023-12-22 06:42:34,427 INFO [train.py:886] (3/4) Epoch 15, batch 1150, loss[loss=0.01481, audio_tagging_loss=0.01481, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4936293.10 frames. ], batch size: 100, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:43:04,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=452693.3333333333, ans=0.125 2023-12-22 06:43:10,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.83 vs. limit=22.5 2023-12-22 06:43:15,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=452760.0, ans=0.125 2023-12-22 06:43:17,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452760.0, ans=0.1 2023-12-22 06:43:22,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=452760.0, ans=0.0 2023-12-22 06:43:23,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=452760.0, ans=0.2 2023-12-22 06:43:26,231 INFO [train.py:886] (3/4) Epoch 15, batch 1200, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4940825.03 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:43:29,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2023-12-22 06:43:32,797 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.691e+01 2.862e+01 3.008e+01 3.701e+01, threshold=5.724e+01, percent-clipped=0.0 2023-12-22 06:43:36,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-22 06:43:57,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453026.6666666667, ans=0.1 2023-12-22 06:44:09,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=453093.3333333333, ans=0.125 2023-12-22 06:44:17,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=453160.0, ans=0.125 2023-12-22 06:44:18,053 INFO [train.py:886] (3/4) Epoch 15, batch 1250, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4933779.63 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:44:21,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=453160.0, ans=0.1 2023-12-22 06:44:33,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=453226.6666666667, ans=0.2 2023-12-22 06:44:42,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=453293.3333333333, ans=0.0 2023-12-22 06:44:51,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=453360.0, ans=0.0 2023-12-22 06:44:59,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=453360.0, ans=0.0 2023-12-22 06:45:00,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=453426.6666666667, ans=0.125 2023-12-22 06:45:01,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-12-22 06:45:09,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-12-22 06:45:11,720 INFO [train.py:886] (3/4) Epoch 15, batch 1300, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4930976.40 frames. ], batch size: 99, lr: 7.34e-03, grad_scale: 64.0 2023-12-22 06:45:19,293 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.803e+01 2.974e+01 3.116e+01 3.587e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 06:45:21,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=453493.3333333333, ans=0.0 2023-12-22 06:45:22,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=453560.0, ans=0.125 2023-12-22 06:45:31,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=22.5 2023-12-22 06:45:50,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=453693.3333333333, ans=0.0 2023-12-22 06:45:55,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=453760.0, ans=0.125 2023-12-22 06:46:04,239 INFO [train.py:886] (3/4) Epoch 15, batch 1350, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4935691.08 frames. ], batch size: 99, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:46:16,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-12-22 06:46:40,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-22 06:46:50,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=454093.3333333333, ans=0.125 2023-12-22 06:46:56,195 INFO [train.py:886] (3/4) Epoch 15, batch 1400, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4943147.46 frames. ], batch size: 99, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:46:56,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=454160.0, ans=0.0 2023-12-22 06:46:57,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454160.0, ans=0.125 2023-12-22 06:47:01,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-12-22 06:47:03,509 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.761e+01 2.896e+01 3.038e+01 4.038e+01, threshold=5.792e+01, percent-clipped=0.0 2023-12-22 06:47:18,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=454293.3333333333, ans=0.1 2023-12-22 06:47:35,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-12-22 06:47:46,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.72 vs. limit=15.0 2023-12-22 06:47:47,644 INFO [train.py:886] (3/4) Epoch 15, batch 1450, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4952054.99 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:48:09,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=454626.6666666667, ans=0.0 2023-12-22 06:48:10,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=454626.6666666667, ans=0.0 2023-12-22 06:48:31,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=454760.0, ans=0.0 2023-12-22 06:48:40,093 INFO [train.py:886] (3/4) Epoch 15, batch 1500, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4960252.42 frames. ], batch size: 100, lr: 7.33e-03, grad_scale: 64.0 2023-12-22 06:48:47,430 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 2.703e+01 2.865e+01 3.021e+01 3.763e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 06:48:47,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=454826.6666666667, ans=0.125 2023-12-22 06:48:47,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=454826.6666666667, ans=0.125 2023-12-22 06:48:53,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=454893.3333333333, ans=0.0 2023-12-22 06:49:24,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2023-12-22 06:49:27,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=455093.3333333333, ans=0.125 2023-12-22 06:49:28,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=455093.3333333333, ans=0.2 2023-12-22 06:49:31,991 INFO [train.py:886] (3/4) Epoch 15, batch 1550, loss[loss=0.01838, audio_tagging_loss=0.01838, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4958884.11 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:49:39,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=455160.0, ans=0.2 2023-12-22 06:49:41,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=455226.6666666667, ans=0.125 2023-12-22 06:49:47,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=455226.6666666667, ans=0.125 2023-12-22 06:49:48,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=455226.6666666667, ans=0.125 2023-12-22 06:49:54,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-12-22 06:50:09,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-22 06:50:18,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=455426.6666666667, ans=0.07 2023-12-22 06:50:19,825 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.544e-03 2023-12-22 06:50:23,332 INFO [train.py:886] (3/4) Epoch 15, batch 1600, loss[loss=0.01481, audio_tagging_loss=0.01481, over 21730.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 4953838.47 frames. ], batch size: 107, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:50:25,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-12-22 06:50:25,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-12-22 06:50:29,906 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.762e+01 2.937e+01 3.082e+01 3.715e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 06:50:36,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=455560.0, ans=0.07 2023-12-22 06:50:40,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455560.0, ans=0.1 2023-12-22 06:50:43,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.93 vs. limit=22.5 2023-12-22 06:50:54,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=455693.3333333333, ans=0.125 2023-12-22 06:51:14,889 INFO [train.py:886] (3/4) Epoch 15, batch 1650, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4951560.92 frames. ], batch size: 99, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:51:31,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=455893.3333333333, ans=0.2 2023-12-22 06:51:33,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455893.3333333333, ans=0.1 2023-12-22 06:52:06,834 INFO [train.py:886] (3/4) Epoch 15, batch 1700, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4951522.63 frames. ], batch size: 100, lr: 7.32e-03, grad_scale: 64.0 2023-12-22 06:52:14,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 2.733e+01 2.857e+01 3.006e+01 3.981e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 06:52:16,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-12-22 06:52:20,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=456226.6666666667, ans=0.0 2023-12-22 06:52:26,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=456293.3333333333, ans=0.0 2023-12-22 06:52:27,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=456293.3333333333, ans=0.125 2023-12-22 06:52:32,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-12-22 06:52:38,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=456360.0, ans=0.125 2023-12-22 06:52:58,974 INFO [train.py:886] (3/4) Epoch 15, batch 1750, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4952750.89 frames. ], batch size: 99, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:05,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2023-12-22 06:53:19,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=456626.6666666667, ans=0.0 2023-12-22 06:53:21,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=456626.6666666667, ans=0.05 2023-12-22 06:53:26,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=456626.6666666667, ans=0.0 2023-12-22 06:53:40,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=456760.0, ans=0.125 2023-12-22 06:53:49,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=456760.0, ans=0.0 2023-12-22 06:53:50,622 INFO [train.py:886] (3/4) Epoch 15, batch 1800, loss[loss=0.01644, audio_tagging_loss=0.01644, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4955315.40 frames. ], batch size: 100, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:53:52,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-12-22 06:53:53,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456826.6666666667, ans=0.1 2023-12-22 06:53:57,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.752e+01 2.915e+01 3.059e+01 3.559e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 06:53:59,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=456826.6666666667, ans=0.125 2023-12-22 06:54:00,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=456893.3333333333, ans=0.125 2023-12-22 06:54:09,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2023-12-22 06:54:16,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=456960.0, ans=0.125 2023-12-22 06:54:30,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=457026.6666666667, ans=0.125 2023-12-22 06:54:30,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=457026.6666666667, ans=0.05 2023-12-22 06:54:34,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:34,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457093.3333333333, ans=0.125 2023-12-22 06:54:37,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=457093.3333333333, ans=0.04949747468305833 2023-12-22 06:54:42,181 INFO [train.py:886] (3/4) Epoch 15, batch 1850, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4959788.40 frames. ], batch size: 99, lr: 7.31e-03, grad_scale: 64.0 2023-12-22 06:54:44,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=457160.0, ans=0.0 2023-12-22 06:54:45,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457160.0, ans=0.1 2023-12-22 06:54:53,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=457226.6666666667, ans=0.125 2023-12-22 06:55:15,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-22 06:55:16,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457360.0, ans=0.125 2023-12-22 06:55:31,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=457426.6666666667, ans=0.0 2023-12-22 06:55:32,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=457426.6666666667, ans=0.125 2023-12-22 06:55:34,783 INFO [train.py:886] (3/4) Epoch 15, batch 1900, loss[loss=0.01653, audio_tagging_loss=0.01653, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4951402.80 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:55:41,994 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.819e+01 2.950e+01 3.088e+01 3.539e+01, threshold=5.899e+01, percent-clipped=0.0 2023-12-22 06:55:42,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457493.3333333333, ans=0.125 2023-12-22 06:56:02,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=457626.6666666667, ans=0.125 2023-12-22 06:56:13,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=457693.3333333333, ans=0.0 2023-12-22 06:56:20,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=457760.0, ans=0.125 2023-12-22 06:56:26,091 INFO [train.py:886] (3/4) Epoch 15, batch 1950, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4947025.13 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:56:42,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-12-22 06:56:44,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457893.3333333333, ans=0.125 2023-12-22 06:56:49,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2023-12-22 06:56:53,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=457960.0, ans=0.125 2023-12-22 06:56:55,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=457960.0, ans=0.025 2023-12-22 06:57:11,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=12.0 2023-12-22 06:57:16,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.53 vs. limit=15.0 2023-12-22 06:57:18,511 INFO [train.py:886] (3/4) Epoch 15, batch 2000, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4952862.38 frames. ], batch size: 100, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:57:25,039 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.729e+01 2.880e+01 3.053e+01 3.863e+01, threshold=5.759e+01, percent-clipped=0.0 2023-12-22 06:57:36,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=458226.6666666667, ans=0.125 2023-12-22 06:57:43,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.34 vs. limit=10.0 2023-12-22 06:57:50,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=458360.0, ans=0.125 2023-12-22 06:58:10,739 INFO [train.py:886] (3/4) Epoch 15, batch 2050, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4952720.06 frames. ], batch size: 99, lr: 7.30e-03, grad_scale: 64.0 2023-12-22 06:58:10,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=458493.3333333333, ans=0.2 2023-12-22 06:58:30,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458626.6666666667, ans=0.125 2023-12-22 06:58:31,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.56 vs. limit=22.5 2023-12-22 06:58:39,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458626.6666666667, ans=0.125 2023-12-22 06:59:01,467 INFO [train.py:886] (3/4) Epoch 15, batch 2100, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4955392.99 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 06:59:09,473 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.721e+01 2.805e+01 2.996e+01 3.469e+01, threshold=5.611e+01, percent-clipped=0.0 2023-12-22 06:59:09,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-12-22 06:59:11,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=458893.3333333333, ans=0.125 2023-12-22 06:59:18,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458893.3333333333, ans=0.1 2023-12-22 06:59:25,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=458960.0, ans=0.125 2023-12-22 06:59:32,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=459026.6666666667, ans=0.0 2023-12-22 06:59:45,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=459093.3333333333, ans=0.015 2023-12-22 06:59:53,686 INFO [train.py:886] (3/4) Epoch 15, batch 2150, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4960491.27 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 06:59:54,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=459160.0, ans=0.2 2023-12-22 06:59:54,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=459160.0, ans=0.1 2023-12-22 06:59:55,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-12-22 06:59:56,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.74 vs. limit=22.5 2023-12-22 07:00:11,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=459226.6666666667, ans=0.125 2023-12-22 07:00:25,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=459360.0, ans=0.0 2023-12-22 07:00:26,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-12-22 07:00:29,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459360.0, ans=0.125 2023-12-22 07:00:44,578 INFO [train.py:886] (3/4) Epoch 15, batch 2200, loss[loss=0.01452, audio_tagging_loss=0.01452, over 23973.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4950184.10 frames. ], batch size: 100, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:00:49,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=459493.3333333333, ans=0.0 2023-12-22 07:00:52,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.795e+01 2.951e+01 3.077e+01 3.607e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:01:07,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=459626.6666666667, ans=0.2 2023-12-22 07:01:10,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=459626.6666666667, ans=0.125 2023-12-22 07:01:14,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=459693.3333333333, ans=0.0 2023-12-22 07:01:23,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459693.3333333333, ans=0.125 2023-12-22 07:01:37,219 INFO [train.py:886] (3/4) Epoch 15, batch 2250, loss[loss=0.0135, audio_tagging_loss=0.0135, over 24750.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4946349.28 frames. ], batch size: 99, lr: 7.29e-03, grad_scale: 64.0 2023-12-22 07:01:41,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=459826.6666666667, ans=0.125 2023-12-22 07:01:43,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2023-12-22 07:01:43,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=459826.6666666667, ans=0.025 2023-12-22 07:02:13,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=460026.6666666667, ans=0.02 2023-12-22 07:02:22,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-12-22 07:02:29,285 INFO [train.py:886] (3/4) Epoch 15, batch 2300, loss[loss=0.0127, audio_tagging_loss=0.0127, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4945698.58 frames. ], batch size: 99, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:02:32,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-12-22 07:02:32,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-12-22 07:02:35,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=15.0 2023-12-22 07:02:36,498 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.737e+01 2.896e+01 3.072e+01 5.073e+01, threshold=5.791e+01, percent-clipped=0.0 2023-12-22 07:02:56,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=460293.3333333333, ans=0.0 2023-12-22 07:02:57,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=460293.3333333333, ans=0.05 2023-12-22 07:02:59,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 07:03:20,222 INFO [train.py:886] (3/4) Epoch 15, batch 2350, loss[loss=0.01939, audio_tagging_loss=0.01939, over 24750.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4946195.57 frames. ], batch size: 99, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:03:32,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460560.0, ans=0.0 2023-12-22 07:03:42,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=460626.6666666667, ans=0.2 2023-12-22 07:03:46,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-22 07:03:51,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.80 vs. limit=22.5 2023-12-22 07:03:57,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=460693.3333333333, ans=0.2 2023-12-22 07:04:05,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460760.0, ans=0.125 2023-12-22 07:04:13,013 INFO [train.py:886] (3/4) Epoch 15, batch 2400, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4950998.10 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:04:19,753 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.726e+01 2.841e+01 2.994e+01 3.395e+01, threshold=5.683e+01, percent-clipped=0.0 2023-12-22 07:04:19,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=460826.6666666667, ans=0.0 2023-12-22 07:04:30,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2023-12-22 07:05:04,075 INFO [train.py:886] (3/4) Epoch 15, batch 2450, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4957698.08 frames. ], batch size: 100, lr: 7.28e-03, grad_scale: 64.0 2023-12-22 07:05:05,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=461160.0, ans=0.0 2023-12-22 07:05:13,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461160.0, ans=0.0 2023-12-22 07:05:14,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=461226.6666666667, ans=0.125 2023-12-22 07:05:31,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=461293.3333333333, ans=0.04949747468305833 2023-12-22 07:05:33,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=461293.3333333333, ans=0.125 2023-12-22 07:05:42,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:42,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=461360.0, ans=0.125 2023-12-22 07:05:56,310 INFO [train.py:886] (3/4) Epoch 15, batch 2500, loss[loss=0.01685, audio_tagging_loss=0.01685, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4955769.63 frames. ], batch size: 99, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:06:00,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=461493.3333333333, ans=0.125 2023-12-22 07:06:02,975 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 2.837e+01 2.934e+01 3.109e+01 3.960e+01, threshold=5.868e+01, percent-clipped=0.0 2023-12-22 07:06:24,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2023-12-22 07:06:42,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2023-12-22 07:06:47,948 INFO [train.py:886] (3/4) Epoch 15, batch 2550, loss[loss=0.01864, audio_tagging_loss=0.01864, over 22222.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4950612.13 frames. ], batch size: 107, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:06:50,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461826.6666666667, ans=0.1 2023-12-22 07:06:58,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.52 vs. limit=15.0 2023-12-22 07:07:01,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=461893.3333333333, ans=0.125 2023-12-22 07:07:02,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=461893.3333333333, ans=0.125 2023-12-22 07:07:02,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=461893.3333333333, ans=0.0 2023-12-22 07:07:23,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=462026.6666666667, ans=0.125 2023-12-22 07:07:24,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462026.6666666667, ans=0.125 2023-12-22 07:07:36,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=462093.3333333333, ans=0.125 2023-12-22 07:07:38,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=462160.0, ans=0.125 2023-12-22 07:07:39,799 INFO [train.py:886] (3/4) Epoch 15, batch 2600, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4952133.26 frames. ], batch size: 99, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:07:40,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=462160.0, ans=0.2 2023-12-22 07:07:45,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=462160.0, ans=0.04949747468305833 2023-12-22 07:07:47,090 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.786e+01 2.944e+01 3.057e+01 3.830e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 07:07:49,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-12-22 07:07:51,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462226.6666666667, ans=0.125 2023-12-22 07:07:53,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-12-22 07:07:53,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2023-12-22 07:08:12,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=462360.0, ans=0.2 2023-12-22 07:08:13,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=462360.0, ans=0.125 2023-12-22 07:08:14,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462360.0, ans=0.125 2023-12-22 07:08:20,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=462426.6666666667, ans=0.125 2023-12-22 07:08:20,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=462426.6666666667, ans=0.2 2023-12-22 07:08:28,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.13 vs. limit=22.5 2023-12-22 07:08:32,453 INFO [train.py:886] (3/4) Epoch 15, batch 2650, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4951557.05 frames. ], batch size: 99, lr: 7.27e-03, grad_scale: 64.0 2023-12-22 07:08:55,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=462626.6666666667, ans=0.125 2023-12-22 07:08:55,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.32 vs. limit=22.5 2023-12-22 07:09:12,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=22.5 2023-12-22 07:09:24,937 INFO [train.py:886] (3/4) Epoch 15, batch 2700, loss[loss=0.01637, audio_tagging_loss=0.01637, over 25000.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4952248.69 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:09:26,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=462826.6666666667, ans=0.0 2023-12-22 07:09:32,224 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.720e+01 2.866e+01 2.980e+01 3.396e+01, threshold=5.733e+01, percent-clipped=0.0 2023-12-22 07:09:37,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-12-22 07:09:53,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462960.0, ans=0.1 2023-12-22 07:09:57,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-22 07:09:58,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=463026.6666666667, ans=0.2 2023-12-22 07:10:00,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=463026.6666666667, ans=0.125 2023-12-22 07:10:02,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-12-22 07:10:05,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=463093.3333333333, ans=0.125 2023-12-22 07:10:16,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2023-12-22 07:10:16,482 INFO [train.py:886] (3/4) Epoch 15, batch 2750, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4956313.63 frames. ], batch size: 100, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:10:23,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=463160.0, ans=0.2 2023-12-22 07:10:25,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=463160.0, ans=0.0 2023-12-22 07:10:31,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463226.6666666667, ans=0.1 2023-12-22 07:10:35,974 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:11:09,253 INFO [train.py:886] (3/4) Epoch 15, batch 2800, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24750.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4951550.54 frames. ], batch size: 99, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:11:11,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=463493.3333333333, ans=0.125 2023-12-22 07:11:11,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=463493.3333333333, ans=0.0 2023-12-22 07:11:17,364 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.810e+01 2.949e+01 3.108e+01 3.472e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 07:11:19,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=463560.0, ans=0.04949747468305833 2023-12-22 07:11:41,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-22 07:11:55,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=463760.0, ans=0.125 2023-12-22 07:12:00,603 INFO [train.py:886] (3/4) Epoch 15, batch 2850, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4949439.68 frames. ], batch size: 99, lr: 7.26e-03, grad_scale: 64.0 2023-12-22 07:12:07,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2023-12-22 07:12:15,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.49 vs. limit=22.5 2023-12-22 07:12:26,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2023-12-22 07:12:52,614 INFO [train.py:886] (3/4) Epoch 15, batch 2900, loss[loss=0.01396, audio_tagging_loss=0.01396, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4945384.00 frames. ], batch size: 99, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:12:55,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=464160.0, ans=0.09899494936611666 2023-12-22 07:13:00,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-12-22 07:13:00,912 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.777e+01 2.915e+01 3.046e+01 3.469e+01, threshold=5.830e+01, percent-clipped=0.0 2023-12-22 07:13:19,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=464293.3333333333, ans=0.0 2023-12-22 07:13:44,941 INFO [train.py:886] (3/4) Epoch 15, batch 2950, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24014.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4951212.59 frames. ], batch size: 100, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:13:49,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-12-22 07:14:36,687 INFO [train.py:886] (3/4) Epoch 15, batch 3000, loss[loss=0.0129, audio_tagging_loss=0.0129, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4955650.87 frames. ], batch size: 99, lr: 7.25e-03, grad_scale: 64.0 2023-12-22 07:14:36,688 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 07:14:57,485 INFO [train.py:917] (3/4) Epoch 15, validation: loss=0.03387, audio_tagging_loss=0.03387, over 3737520.00 frames. 2023-12-22 07:14:57,485 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 07:14:57,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=464826.6666666667, ans=0.125 2023-12-22 07:15:05,581 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.703e+01 2.839e+01 2.986e+01 3.331e+01, threshold=5.678e+01, percent-clipped=0.0 2023-12-22 07:15:08,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=464893.3333333333, ans=0.125 2023-12-22 07:15:09,310 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:15:18,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-12-22 07:15:28,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2023-12-22 07:15:30,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465026.6666666667, ans=0.125 2023-12-22 07:15:49,504 INFO [train.py:886] (3/4) Epoch 15, batch 3050, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4951492.13 frames. ], batch size: 99, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:15:56,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=465160.0, ans=0.07 2023-12-22 07:15:59,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=465226.6666666667, ans=0.1 2023-12-22 07:16:09,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-22 07:16:36,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2023-12-22 07:16:40,680 INFO [train.py:886] (3/4) Epoch 15, batch 3100, loss[loss=0.01396, audio_tagging_loss=0.01396, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4954244.73 frames. ], batch size: 99, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:16:41,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2023-12-22 07:16:49,701 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.496e+01 2.762e+01 2.896e+01 3.048e+01 3.549e+01, threshold=5.793e+01, percent-clipped=0.0 2023-12-22 07:16:55,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=465560.0, ans=0.125 2023-12-22 07:16:56,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=465560.0, ans=0.125 2023-12-22 07:16:58,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=465560.0, ans=0.2 2023-12-22 07:17:13,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=465693.3333333333, ans=0.025 2023-12-22 07:17:24,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=465760.0, ans=0.2 2023-12-22 07:17:33,962 INFO [train.py:886] (3/4) Epoch 15, batch 3150, loss[loss=0.01355, audio_tagging_loss=0.01355, over 23992.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4940882.16 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:17:49,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465893.3333333333, ans=0.1 2023-12-22 07:17:53,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-12-22 07:18:02,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=465960.0, ans=0.0 2023-12-22 07:18:02,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=465960.0, ans=0.125 2023-12-22 07:18:22,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-12-22 07:18:25,587 INFO [train.py:886] (3/4) Epoch 15, batch 3200, loss[loss=0.01321, audio_tagging_loss=0.01321, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4941651.11 frames. ], batch size: 100, lr: 7.24e-03, grad_scale: 64.0 2023-12-22 07:18:34,468 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.747e+01 2.900e+01 3.032e+01 3.537e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 07:18:37,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=466226.6666666667, ans=0.125 2023-12-22 07:18:45,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466293.3333333333, ans=0.125 2023-12-22 07:19:16,943 INFO [train.py:886] (3/4) Epoch 15, batch 3250, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4943040.01 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:19:20,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-12-22 07:19:22,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466493.3333333333, ans=0.0 2023-12-22 07:19:31,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=466560.0, ans=0.125 2023-12-22 07:19:33,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=466560.0, ans=0.0 2023-12-22 07:19:37,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-12-22 07:19:38,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=466626.6666666667, ans=0.125 2023-12-22 07:19:38,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=466626.6666666667, ans=0.2 2023-12-22 07:19:38,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-12-22 07:19:43,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2023-12-22 07:19:54,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=466693.3333333333, ans=0.1 2023-12-22 07:20:06,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=466760.0, ans=0.95 2023-12-22 07:20:08,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=466826.6666666667, ans=0.035 2023-12-22 07:20:09,537 INFO [train.py:886] (3/4) Epoch 15, batch 3300, loss[loss=0.01427, audio_tagging_loss=0.01427, over 23979.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4947581.41 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:20:13,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=466826.6666666667, ans=0.0 2023-12-22 07:20:16,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2023-12-22 07:20:17,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 2.692e+01 2.863e+01 2.982e+01 3.634e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:20:39,930 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:20:40,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=467026.6666666667, ans=0.0 2023-12-22 07:20:42,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=467026.6666666667, ans=0.09899494936611666 2023-12-22 07:21:00,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=467160.0, ans=0.0 2023-12-22 07:21:00,883 INFO [train.py:886] (3/4) Epoch 15, batch 3350, loss[loss=0.01548, audio_tagging_loss=0.01548, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4948426.85 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:21:13,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2023-12-22 07:21:45,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=467426.6666666667, ans=0.0 2023-12-22 07:21:53,187 INFO [train.py:886] (3/4) Epoch 15, batch 3400, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24034.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4956780.88 frames. ], batch size: 100, lr: 7.23e-03, grad_scale: 64.0 2023-12-22 07:22:00,730 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.741e+01 2.914e+01 3.061e+01 3.505e+01, threshold=5.827e+01, percent-clipped=0.0 2023-12-22 07:22:15,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=467626.6666666667, ans=0.125 2023-12-22 07:22:30,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467693.3333333333, ans=0.1 2023-12-22 07:22:44,537 INFO [train.py:886] (3/4) Epoch 15, batch 3450, loss[loss=0.01648, audio_tagging_loss=0.01648, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4955198.92 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:22:52,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=467826.6666666667, ans=0.125 2023-12-22 07:22:56,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=467893.3333333333, ans=0.0 2023-12-22 07:23:08,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=15.0 2023-12-22 07:23:15,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468026.6666666667, ans=0.1 2023-12-22 07:23:15,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=468026.6666666667, ans=0.125 2023-12-22 07:23:26,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=468093.3333333333, ans=0.2 2023-12-22 07:23:28,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=468093.3333333333, ans=0.125 2023-12-22 07:23:36,031 INFO [train.py:886] (3/4) Epoch 15, batch 3500, loss[loss=0.01582, audio_tagging_loss=0.01582, over 24750.00 frames. ], tot_loss[loss=0.01434, audio_tagging_loss=0.01434, over 4951273.09 frames. ], batch size: 99, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:23:37,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=468160.0, ans=0.125 2023-12-22 07:23:43,630 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.815e+01 2.965e+01 3.096e+01 3.766e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 07:23:48,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=468226.6666666667, ans=0.125 2023-12-22 07:23:53,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468226.6666666667, ans=0.0 2023-12-22 07:23:53,549 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:23:58,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=468293.3333333333, ans=0.125 2023-12-22 07:23:59,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=468293.3333333333, ans=0.0 2023-12-22 07:24:06,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=468360.0, ans=0.125 2023-12-22 07:24:14,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=468360.0, ans=0.0 2023-12-22 07:24:22,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=468426.6666666667, ans=0.05 2023-12-22 07:24:28,364 INFO [train.py:886] (3/4) Epoch 15, batch 3550, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4953950.43 frames. ], batch size: 100, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:24:44,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=468560.0, ans=0.125 2023-12-22 07:25:02,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468693.3333333333, ans=0.125 2023-12-22 07:25:06,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=468693.3333333333, ans=0.125 2023-12-22 07:25:11,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.13 vs. limit=22.5 2023-12-22 07:25:19,408 INFO [train.py:886] (3/4) Epoch 15, batch 3600, loss[loss=0.01229, audio_tagging_loss=0.01229, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4954165.54 frames. ], batch size: 100, lr: 7.22e-03, grad_scale: 64.0 2023-12-22 07:25:28,302 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.699e+01 2.856e+01 3.019e+01 3.433e+01, threshold=5.713e+01, percent-clipped=0.0 2023-12-22 07:25:35,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2023-12-22 07:25:55,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=469026.6666666667, ans=0.0 2023-12-22 07:26:11,393 INFO [train.py:886] (3/4) Epoch 15, batch 3650, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4955744.96 frames. ], batch size: 100, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:26:21,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=469226.6666666667, ans=0.125 2023-12-22 07:26:31,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=469293.3333333333, ans=0.0 2023-12-22 07:26:31,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=469293.3333333333, ans=0.125 2023-12-22 07:26:51,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-12-22 07:26:56,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 07:27:02,573 INFO [train.py:886] (3/4) Epoch 15, batch 3700, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4954350.05 frames. ], batch size: 100, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:27:11,638 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.741e+01 2.872e+01 3.028e+01 3.371e+01, threshold=5.744e+01, percent-clipped=0.0 2023-12-22 07:27:36,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=469693.3333333333, ans=0.125 2023-12-22 07:27:44,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=469760.0, ans=0.0 2023-12-22 07:27:49,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=469760.0, ans=0.125 2023-12-22 07:27:53,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=469826.6666666667, ans=0.5 2023-12-22 07:27:55,259 INFO [train.py:886] (3/4) Epoch 15, batch 3750, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4945349.57 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:28:07,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-22 07:28:08,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=469893.3333333333, ans=0.0 2023-12-22 07:28:12,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=12.0 2023-12-22 07:28:28,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=470026.6666666667, ans=0.125 2023-12-22 07:28:38,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=470093.3333333333, ans=0.2 2023-12-22 07:28:39,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=470093.3333333333, ans=0.125 2023-12-22 07:28:44,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-12-22 07:28:47,224 INFO [train.py:886] (3/4) Epoch 15, batch 3800, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4939649.10 frames. ], batch size: 99, lr: 7.21e-03, grad_scale: 64.0 2023-12-22 07:28:47,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2023-12-22 07:28:55,414 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 2.844e+01 2.987e+01 3.173e+01 3.633e+01, threshold=5.973e+01, percent-clipped=0.0 2023-12-22 07:29:10,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=470293.3333333333, ans=0.125 2023-12-22 07:29:17,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=470360.0, ans=0.125 2023-12-22 07:29:22,425 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:29:32,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=470426.6666666667, ans=0.125 2023-12-22 07:29:37,986 INFO [train.py:886] (3/4) Epoch 15, batch 3850, loss[loss=0.01239, audio_tagging_loss=0.01239, over 24116.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4942950.64 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:29:42,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=470493.3333333333, ans=0.0 2023-12-22 07:29:42,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=470493.3333333333, ans=0.125 2023-12-22 07:29:48,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=470560.0, ans=0.0 2023-12-22 07:29:50,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=470560.0, ans=0.05 2023-12-22 07:30:24,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=470760.0, ans=0.0 2023-12-22 07:30:30,447 INFO [train.py:886] (3/4) Epoch 15, batch 3900, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4946941.27 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:30:37,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=470826.6666666667, ans=0.0 2023-12-22 07:30:37,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=470826.6666666667, ans=0.0 2023-12-22 07:30:38,599 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.742e+01 2.857e+01 3.020e+01 3.655e+01, threshold=5.714e+01, percent-clipped=0.0 2023-12-22 07:30:42,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=470893.3333333333, ans=0.125 2023-12-22 07:31:05,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=28.13 vs. limit=22.5 2023-12-22 07:31:07,967 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:31:20,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=471093.3333333333, ans=0.0 2023-12-22 07:31:22,512 INFO [train.py:886] (3/4) Epoch 15, batch 3950, loss[loss=0.01427, audio_tagging_loss=0.01427, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4950683.23 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:31:28,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.12 vs. limit=22.5 2023-12-22 07:31:29,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=471160.0, ans=0.125 2023-12-22 07:31:52,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2023-12-22 07:32:02,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-12-22 07:32:13,921 INFO [train.py:886] (3/4) Epoch 15, batch 4000, loss[loss=0.01659, audio_tagging_loss=0.01659, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4947954.87 frames. ], batch size: 100, lr: 7.20e-03, grad_scale: 64.0 2023-12-22 07:32:14,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=471493.3333333333, ans=0.04949747468305833 2023-12-22 07:32:16,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=471493.3333333333, ans=0.0 2023-12-22 07:32:21,503 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.772e+01 2.863e+01 2.977e+01 3.465e+01, threshold=5.726e+01, percent-clipped=0.0 2023-12-22 07:32:24,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=471560.0, ans=0.09899494936611666 2023-12-22 07:32:30,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-22 07:32:49,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=471693.3333333333, ans=0.125 2023-12-22 07:32:49,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=471693.3333333333, ans=0.2 2023-12-22 07:32:52,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=471693.3333333333, ans=0.0 2023-12-22 07:32:57,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=471760.0, ans=0.2 2023-12-22 07:33:04,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-12-22 07:33:05,124 INFO [train.py:886] (3/4) Epoch 15, batch 4050, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4949946.95 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:33:27,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471960.0, ans=0.125 2023-12-22 07:33:34,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=471960.0, ans=0.125 2023-12-22 07:33:42,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=472026.6666666667, ans=0.0 2023-12-22 07:33:50,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=472093.3333333333, ans=0.125 2023-12-22 07:33:57,427 INFO [train.py:886] (3/4) Epoch 15, batch 4100, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4950006.29 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:33:59,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2023-12-22 07:34:01,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=472160.0, ans=0.125 2023-12-22 07:34:05,955 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.831e+01 2.961e+01 3.185e+01 3.752e+01, threshold=5.922e+01, percent-clipped=0.0 2023-12-22 07:34:06,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=472160.0, ans=0.2 2023-12-22 07:34:22,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=472293.3333333333, ans=0.125 2023-12-22 07:34:40,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=472426.6666666667, ans=0.0 2023-12-22 07:34:47,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=472493.3333333333, ans=0.125 2023-12-22 07:34:49,335 INFO [train.py:886] (3/4) Epoch 15, batch 4150, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 4951926.87 frames. ], batch size: 99, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:34:52,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=472493.3333333333, ans=0.0 2023-12-22 07:34:53,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=472493.3333333333, ans=0.125 2023-12-22 07:35:16,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=472626.6666666667, ans=0.0 2023-12-22 07:35:17,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472626.6666666667, ans=0.125 2023-12-22 07:35:26,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=472693.3333333333, ans=0.125 2023-12-22 07:35:27,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=472693.3333333333, ans=0.0 2023-12-22 07:35:33,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=472760.0, ans=0.2 2023-12-22 07:35:40,923 INFO [train.py:886] (3/4) Epoch 15, batch 4200, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4956068.34 frames. ], batch size: 100, lr: 7.19e-03, grad_scale: 64.0 2023-12-22 07:35:43,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=472826.6666666667, ans=0.2 2023-12-22 07:35:48,440 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.360e+01 2.733e+01 2.855e+01 3.027e+01 3.631e+01, threshold=5.711e+01, percent-clipped=0.0 2023-12-22 07:35:54,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=472893.3333333333, ans=0.0 2023-12-22 07:36:07,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=472960.0, ans=0.125 2023-12-22 07:36:08,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=472960.0, ans=0.0 2023-12-22 07:36:32,703 INFO [train.py:886] (3/4) Epoch 15, batch 4250, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4953667.68 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:36:33,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 07:36:40,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=473160.0, ans=0.125 2023-12-22 07:36:47,691 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:36:55,312 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:37:02,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-12-22 07:37:05,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.77 vs. limit=10.0 2023-12-22 07:37:24,536 INFO [train.py:886] (3/4) Epoch 15, batch 4300, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4953208.91 frames. ], batch size: 100, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:37:27,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=473493.3333333333, ans=0.2 2023-12-22 07:37:32,803 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+01 2.780e+01 2.893e+01 3.020e+01 3.657e+01, threshold=5.787e+01, percent-clipped=0.0 2023-12-22 07:37:40,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=473560.0, ans=0.09899494936611666 2023-12-22 07:37:40,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=473560.0, ans=15.0 2023-12-22 07:37:43,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=15.0 2023-12-22 07:37:47,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473626.6666666667, ans=0.1 2023-12-22 07:38:16,778 INFO [train.py:886] (3/4) Epoch 15, batch 4350, loss[loss=0.01611, audio_tagging_loss=0.01611, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4958014.29 frames. ], batch size: 99, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:38:17,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=473826.6666666667, ans=0.0 2023-12-22 07:38:36,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-12-22 07:38:46,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=474026.6666666667, ans=0.95 2023-12-22 07:38:58,430 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:39:08,300 INFO [train.py:886] (3/4) Epoch 15, batch 4400, loss[loss=0.01684, audio_tagging_loss=0.01684, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4940421.57 frames. ], batch size: 99, lr: 7.18e-03, grad_scale: 64.0 2023-12-22 07:39:16,463 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.787e+01 2.952e+01 3.076e+01 3.640e+01, threshold=5.903e+01, percent-clipped=0.0 2023-12-22 07:39:22,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474226.6666666667, ans=0.125 2023-12-22 07:39:54,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=474426.6666666667, ans=0.0 2023-12-22 07:40:00,416 INFO [train.py:886] (3/4) Epoch 15, batch 4450, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4943176.06 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:40:05,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2023-12-22 07:40:06,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474493.3333333333, ans=0.0 2023-12-22 07:40:27,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2023-12-22 07:40:33,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-22 07:40:40,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=474760.0, ans=0.0 2023-12-22 07:40:45,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2023-12-22 07:40:51,810 INFO [train.py:886] (3/4) Epoch 15, batch 4500, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4944251.34 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:41:00,680 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 2.748e+01 2.874e+01 3.023e+01 3.451e+01, threshold=5.747e+01, percent-clipped=0.0 2023-12-22 07:41:16,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=474960.0, ans=0.0 2023-12-22 07:41:25,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=475026.6666666667, ans=0.1 2023-12-22 07:41:43,286 INFO [train.py:886] (3/4) Epoch 15, batch 4550, loss[loss=0.01722, audio_tagging_loss=0.01722, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4949961.51 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:41:43,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=475160.0, ans=0.025 2023-12-22 07:41:44,567 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:41:45,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=475160.0, ans=0.0 2023-12-22 07:41:55,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=475226.6666666667, ans=0.0 2023-12-22 07:42:11,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=475293.3333333333, ans=0.125 2023-12-22 07:42:12,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=475293.3333333333, ans=0.125 2023-12-22 07:42:24,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=475426.6666666667, ans=0.2 2023-12-22 07:42:27,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=475426.6666666667, ans=0.125 2023-12-22 07:42:35,174 INFO [train.py:886] (3/4) Epoch 15, batch 4600, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24053.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4954115.32 frames. ], batch size: 100, lr: 7.17e-03, grad_scale: 64.0 2023-12-22 07:42:37,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=475493.3333333333, ans=0.125 2023-12-22 07:42:37,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=475493.3333333333, ans=0.125 2023-12-22 07:42:40,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=475493.3333333333, ans=10.0 2023-12-22 07:42:42,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=475493.3333333333, ans=0.0 2023-12-22 07:42:43,598 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.790e+01 2.912e+01 3.073e+01 3.704e+01, threshold=5.824e+01, percent-clipped=0.0 2023-12-22 07:43:01,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-12-22 07:43:12,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=475693.3333333333, ans=0.0 2023-12-22 07:43:22,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475760.0, ans=0.1 2023-12-22 07:43:24,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475760.0, ans=0.125 2023-12-22 07:43:27,434 INFO [train.py:886] (3/4) Epoch 15, batch 4650, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4958131.09 frames. ], batch size: 100, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:43:37,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=475893.3333333333, ans=0.125 2023-12-22 07:43:44,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-12-22 07:43:46,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2023-12-22 07:44:08,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=476093.3333333333, ans=0.125 2023-12-22 07:44:10,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-22 07:44:11,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=476093.3333333333, ans=0.07 2023-12-22 07:44:15,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=476093.3333333333, ans=0.125 2023-12-22 07:44:18,118 INFO [train.py:886] (3/4) Epoch 15, batch 4700, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4954739.67 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 64.0 2023-12-22 07:44:23,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-22 07:44:26,796 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.835e+01 2.953e+01 3.126e+01 3.694e+01, threshold=5.906e+01, percent-clipped=0.0 2023-12-22 07:44:32,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=476226.6666666667, ans=0.0 2023-12-22 07:44:33,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=476226.6666666667, ans=0.0 2023-12-22 07:45:04,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=476426.6666666667, ans=0.125 2023-12-22 07:45:05,675 INFO [train.py:886] (3/4) Epoch 15, batch 4750, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4948686.54 frames. ], batch size: 99, lr: 7.16e-03, grad_scale: 128.0 2023-12-22 07:45:10,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476493.3333333333, ans=0.1 2023-12-22 07:45:42,780 INFO [train.py:886] (3/4) Epoch 16, batch 0, loss[loss=0.0322, audio_tagging_loss=0.0322, over 25000.00 frames. ], tot_loss[loss=0.0322, audio_tagging_loss=0.0322, over 25000.00 frames. ], batch size: 100, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:45:42,781 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 07:46:04,058 INFO [train.py:917] (3/4) Epoch 16, validation: loss=0.03318, audio_tagging_loss=0.03318, over 3737520.00 frames. 2023-12-22 07:46:04,059 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 07:46:11,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=476600.0, ans=0.0 2023-12-22 07:46:41,734 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:46:49,656 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.881e+01 3.132e+01 4.118e+01 9.111e+01, threshold=6.264e+01, percent-clipped=8.0 2023-12-22 07:46:52,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=476866.6666666667, ans=0.125 2023-12-22 07:46:55,363 INFO [train.py:886] (3/4) Epoch 16, batch 50, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.02229, audio_tagging_loss=0.02229, over 1117171.45 frames. ], batch size: 100, lr: 6.93e-03, grad_scale: 32.0 2023-12-22 07:46:58,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=476933.3333333333, ans=0.0 2023-12-22 07:47:24,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=477066.6666666667, ans=0.125 2023-12-22 07:47:38,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=477200.0, ans=0.0 2023-12-22 07:47:39,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477200.0, ans=0.1 2023-12-22 07:47:47,680 INFO [train.py:886] (3/4) Epoch 16, batch 100, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01938, audio_tagging_loss=0.01938, over 1972868.21 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:09,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=477400.0, ans=0.125 2023-12-22 07:48:20,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=477466.6666666667, ans=0.125 2023-12-22 07:48:33,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.658e+01 3.003e+01 3.220e+01 3.387e+01 3.937e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 07:48:38,854 INFO [train.py:886] (3/4) Epoch 16, batch 150, loss[loss=0.01455, audio_tagging_loss=0.01455, over 23989.00 frames. ], tot_loss[loss=0.01747, audio_tagging_loss=0.01747, over 2634942.00 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:48:40,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2023-12-22 07:49:08,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477733.3333333333, ans=0.0 2023-12-22 07:49:25,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=477866.6666666667, ans=0.125 2023-12-22 07:49:28,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-12-22 07:49:31,239 INFO [train.py:886] (3/4) Epoch 16, batch 200, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 3150908.95 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:49:33,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=477933.3333333333, ans=10.0 2023-12-22 07:49:41,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478000.0, ans=0.1 2023-12-22 07:49:47,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=478000.0, ans=0.125 2023-12-22 07:50:12,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=478200.0, ans=0.2 2023-12-22 07:50:16,277 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.761e+01 2.944e+01 3.069e+01 3.550e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 07:50:21,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=478266.6666666667, ans=0.0 2023-12-22 07:50:22,662 INFO [train.py:886] (3/4) Epoch 16, batch 250, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01593, audio_tagging_loss=0.01593, over 3554078.83 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 32.0 2023-12-22 07:50:47,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=478400.0, ans=10.0 2023-12-22 07:51:02,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=478466.6666666667, ans=0.125 2023-12-22 07:51:05,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=478533.3333333333, ans=0.125 2023-12-22 07:51:05,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-12-22 07:51:12,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-12-22 07:51:15,085 INFO [train.py:886] (3/4) Epoch 16, batch 300, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.01556, audio_tagging_loss=0.01556, over 3865387.68 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:51:17,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=478600.0, ans=0.125 2023-12-22 07:51:18,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=478600.0, ans=0.125 2023-12-22 07:51:20,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=478600.0, ans=0.0 2023-12-22 07:51:24,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478666.6666666667, ans=0.1 2023-12-22 07:51:28,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.85 vs. limit=22.5 2023-12-22 07:51:38,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-22 07:51:57,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-12-22 07:52:00,604 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.872e+01 2.968e+01 3.150e+01 3.579e+01, threshold=5.936e+01, percent-clipped=0.0 2023-12-22 07:52:07,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-12-22 07:52:07,796 INFO [train.py:886] (3/4) Epoch 16, batch 350, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01526, audio_tagging_loss=0.01526, over 4104641.42 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:52:21,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479000.0, ans=0.125 2023-12-22 07:52:53,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 2023-12-22 07:52:57,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=479200.0, ans=0.0 2023-12-22 07:52:59,652 INFO [train.py:886] (3/4) Epoch 16, batch 400, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01491, audio_tagging_loss=0.01491, over 4294313.81 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:53:13,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=479333.3333333333, ans=0.125 2023-12-22 07:53:14,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=479333.3333333333, ans=0.2 2023-12-22 07:53:25,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=479400.0, ans=0.125 2023-12-22 07:53:40,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=479533.3333333333, ans=0.5 2023-12-22 07:53:44,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=479533.3333333333, ans=0.125 2023-12-22 07:53:45,880 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.751e+01 2.885e+01 3.045e+01 3.501e+01, threshold=5.769e+01, percent-clipped=0.0 2023-12-22 07:53:46,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=479533.3333333333, ans=0.125 2023-12-22 07:53:49,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=479533.3333333333, ans=0.125 2023-12-22 07:53:51,540 INFO [train.py:886] (3/4) Epoch 16, batch 450, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4442843.12 frames. ], batch size: 99, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:53:59,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.80 vs. limit=22.5 2023-12-22 07:53:59,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=479600.0, ans=0.035 2023-12-22 07:54:04,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479666.6666666667, ans=0.125 2023-12-22 07:54:10,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.97 vs. limit=22.5 2023-12-22 07:54:33,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-12-22 07:54:43,144 INFO [train.py:886] (3/4) Epoch 16, batch 500, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4553493.14 frames. ], batch size: 100, lr: 6.91e-03, grad_scale: 32.0 2023-12-22 07:54:50,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=479933.3333333333, ans=0.125 2023-12-22 07:54:56,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=480000.0, ans=0.125 2023-12-22 07:54:58,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-22 07:55:05,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-12-22 07:55:13,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480066.6666666667, ans=0.1 2023-12-22 07:55:31,300 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 2.740e+01 2.884e+01 3.004e+01 3.376e+01, threshold=5.768e+01, percent-clipped=0.0 2023-12-22 07:55:36,284 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 07:55:36,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2023-12-22 07:55:37,647 INFO [train.py:886] (3/4) Epoch 16, batch 550, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01446, audio_tagging_loss=0.01446, over 4641591.73 frames. ], batch size: 100, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:56:02,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=480400.0, ans=0.0 2023-12-22 07:56:06,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2023-12-22 07:56:29,822 INFO [train.py:886] (3/4) Epoch 16, batch 600, loss[loss=0.01546, audio_tagging_loss=0.01546, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4708271.70 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:56:35,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=480600.0, ans=0.125 2023-12-22 07:56:37,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=480600.0, ans=0.125 2023-12-22 07:56:47,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=480666.6666666667, ans=0.125 2023-12-22 07:56:49,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480733.3333333333, ans=0.1 2023-12-22 07:57:00,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480800.0, ans=0.125 2023-12-22 07:57:04,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.69 vs. limit=22.5 2023-12-22 07:57:14,892 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 2.812e+01 2.929e+01 3.069e+01 3.591e+01, threshold=5.857e+01, percent-clipped=0.0 2023-12-22 07:57:15,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480866.6666666667, ans=0.1 2023-12-22 07:57:21,258 INFO [train.py:886] (3/4) Epoch 16, batch 650, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01438, audio_tagging_loss=0.01438, over 4752944.00 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:57:29,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=480933.3333333333, ans=15.0 2023-12-22 07:57:38,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=481000.0, ans=0.125 2023-12-22 07:57:41,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=481066.6666666667, ans=10.0 2023-12-22 07:57:57,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=481133.3333333333, ans=0.125 2023-12-22 07:58:07,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481200.0, ans=0.125 2023-12-22 07:58:09,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=481200.0, ans=0.125 2023-12-22 07:58:13,160 INFO [train.py:886] (3/4) Epoch 16, batch 700, loss[loss=0.0167, audio_tagging_loss=0.0167, over 24750.00 frames. ], tot_loss[loss=0.01442, audio_tagging_loss=0.01442, over 4792113.00 frames. ], batch size: 99, lr: 6.90e-03, grad_scale: 32.0 2023-12-22 07:58:25,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=481333.3333333333, ans=0.5 2023-12-22 07:58:25,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=481333.3333333333, ans=0.0 2023-12-22 07:58:35,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=481400.0, ans=0.125 2023-12-22 07:58:48,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481466.6666666667, ans=0.1 2023-12-22 07:58:49,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=481466.6666666667, ans=0.125 2023-12-22 07:58:52,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-12-22 07:58:58,829 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.788e+01 2.907e+01 3.038e+01 3.343e+01, threshold=5.814e+01, percent-clipped=0.0 2023-12-22 07:59:00,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481533.3333333333, ans=0.1 2023-12-22 07:59:05,266 INFO [train.py:886] (3/4) Epoch 16, batch 750, loss[loss=0.01353, audio_tagging_loss=0.01353, over 25000.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4828357.63 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 07:59:25,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=481733.3333333333, ans=0.125 2023-12-22 07:59:34,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=481733.3333333333, ans=0.0 2023-12-22 07:59:36,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=481800.0, ans=6.0 2023-12-22 07:59:41,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-12-22 07:59:45,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=481866.6666666667, ans=0.0 2023-12-22 07:59:46,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-12-22 07:59:49,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=481866.6666666667, ans=0.05 2023-12-22 07:59:49,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=481866.6666666667, ans=0.04949747468305833 2023-12-22 07:59:56,141 INFO [train.py:886] (3/4) Epoch 16, batch 800, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4857186.57 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:03,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481933.3333333333, ans=0.1 2023-12-22 08:00:36,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=482133.3333333333, ans=0.0 2023-12-22 08:00:42,818 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.804e+01 2.918e+01 3.049e+01 4.122e+01, threshold=5.837e+01, percent-clipped=0.0 2023-12-22 08:00:49,190 INFO [train.py:886] (3/4) Epoch 16, batch 850, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4886926.12 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:00:53,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-12-22 08:00:58,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=482333.3333333333, ans=0.125 2023-12-22 08:01:04,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-22 08:01:40,642 INFO [train.py:886] (3/4) Epoch 16, batch 900, loss[loss=0.01349, audio_tagging_loss=0.01349, over 24020.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4897028.44 frames. ], batch size: 100, lr: 6.89e-03, grad_scale: 32.0 2023-12-22 08:01:42,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=482600.0, ans=0.0 2023-12-22 08:02:04,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=482733.3333333333, ans=0.025 2023-12-22 08:02:13,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2023-12-22 08:02:26,291 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.779e+01 2.893e+01 3.122e+01 3.718e+01, threshold=5.785e+01, percent-clipped=0.0 2023-12-22 08:02:31,925 INFO [train.py:886] (3/4) Epoch 16, batch 950, loss[loss=0.01493, audio_tagging_loss=0.01493, over 24750.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 4904858.06 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:02:50,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=483000.0, ans=0.2 2023-12-22 08:02:52,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-22 08:03:25,261 INFO [train.py:886] (3/4) Epoch 16, batch 1000, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 4911221.06 frames. ], batch size: 100, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:03:34,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483333.3333333333, ans=0.1 2023-12-22 08:03:39,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.45 vs. limit=22.5 2023-12-22 08:03:45,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=483400.0, ans=0.0 2023-12-22 08:03:51,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483400.0, ans=0.1 2023-12-22 08:04:09,565 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.390e+01 2.722e+01 2.865e+01 3.109e+01 3.615e+01, threshold=5.730e+01, percent-clipped=0.0 2023-12-22 08:04:11,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483533.3333333333, ans=0.125 2023-12-22 08:04:15,971 INFO [train.py:886] (3/4) Epoch 16, batch 1050, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4924081.50 frames. ], batch size: 100, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:04:25,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-22 08:04:34,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=483666.6666666667, ans=0.125 2023-12-22 08:04:40,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483733.3333333333, ans=0.1 2023-12-22 08:04:45,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=483800.0, ans=0.125 2023-12-22 08:04:52,179 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:05:01,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=483866.6666666667, ans=0.0 2023-12-22 08:05:08,397 INFO [train.py:886] (3/4) Epoch 16, batch 1100, loss[loss=0.01147, audio_tagging_loss=0.01147, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4936078.37 frames. ], batch size: 99, lr: 6.88e-03, grad_scale: 32.0 2023-12-22 08:05:10,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=483933.3333333333, ans=0.2 2023-12-22 08:05:12,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483933.3333333333, ans=0.1 2023-12-22 08:05:14,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=483933.3333333333, ans=0.125 2023-12-22 08:05:19,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=484000.0, ans=0.125 2023-12-22 08:05:23,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=484000.0, ans=0.125 2023-12-22 08:05:26,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=484000.0, ans=0.07 2023-12-22 08:05:36,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=484066.6666666667, ans=0.2 2023-12-22 08:05:39,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=484133.3333333333, ans=0.125 2023-12-22 08:05:48,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=484200.0, ans=0.0 2023-12-22 08:05:53,772 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.525e+01 2.808e+01 2.906e+01 3.022e+01 3.549e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:05:58,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484266.6666666667, ans=0.1 2023-12-22 08:05:59,462 INFO [train.py:886] (3/4) Epoch 16, batch 1150, loss[loss=0.015, audio_tagging_loss=0.015, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4937274.54 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:10,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2023-12-22 08:06:22,552 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.642e-03 2023-12-22 08:06:25,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=484400.0, ans=0.125 2023-12-22 08:06:43,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=484533.3333333333, ans=0.0 2023-12-22 08:06:44,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=484533.3333333333, ans=0.0 2023-12-22 08:06:50,719 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:06:51,518 INFO [train.py:886] (3/4) Epoch 16, batch 1200, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4943858.42 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:06:54,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484600.0, ans=0.1 2023-12-22 08:07:02,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484666.6666666667, ans=0.0 2023-12-22 08:07:07,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-22 08:07:14,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=484733.3333333333, ans=0.125 2023-12-22 08:07:14,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2023-12-22 08:07:16,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=484733.3333333333, ans=0.0 2023-12-22 08:07:18,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=484733.3333333333, ans=0.2 2023-12-22 08:07:19,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-12-22 08:07:30,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=484800.0, ans=0.125 2023-12-22 08:07:32,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-22 08:07:35,751 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.792e+01 2.945e+01 3.119e+01 3.482e+01, threshold=5.890e+01, percent-clipped=0.0 2023-12-22 08:07:39,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484866.6666666667, ans=0.1 2023-12-22 08:07:42,913 INFO [train.py:886] (3/4) Epoch 16, batch 1250, loss[loss=0.0149, audio_tagging_loss=0.0149, over 24750.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4944821.55 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:08:02,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=485066.6666666667, ans=0.125 2023-12-22 08:08:03,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=12.0 2023-12-22 08:08:12,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=485133.3333333333, ans=0.125 2023-12-22 08:08:29,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=485200.0, ans=0.0 2023-12-22 08:08:33,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=485266.6666666667, ans=0.5 2023-12-22 08:08:33,970 INFO [train.py:886] (3/4) Epoch 16, batch 1300, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.0145, audio_tagging_loss=0.0145, over 4940363.52 frames. ], batch size: 100, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:08:36,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=485266.6666666667, ans=0.0 2023-12-22 08:08:46,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=485333.3333333333, ans=0.125 2023-12-22 08:09:20,878 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.809e+01 2.938e+01 3.092e+01 3.584e+01, threshold=5.876e+01, percent-clipped=0.0 2023-12-22 08:09:23,947 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:09:24,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=485533.3333333333, ans=0.04949747468305833 2023-12-22 08:09:26,556 INFO [train.py:886] (3/4) Epoch 16, batch 1350, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 4942670.98 frames. ], batch size: 99, lr: 6.87e-03, grad_scale: 32.0 2023-12-22 08:10:00,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=485800.0, ans=0.0 2023-12-22 08:10:11,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=485866.6666666667, ans=0.125 2023-12-22 08:10:13,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=485866.6666666667, ans=0.125 2023-12-22 08:10:14,874 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.503e-03 2023-12-22 08:10:15,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485866.6666666667, ans=0.1 2023-12-22 08:10:18,073 INFO [train.py:886] (3/4) Epoch 16, batch 1400, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4942645.75 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:10:19,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=12.0 2023-12-22 08:10:22,085 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.687e-03 2023-12-22 08:10:36,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=486000.0, ans=0.0 2023-12-22 08:10:40,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-12-22 08:10:55,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2023-12-22 08:10:56,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=486133.3333333333, ans=0.125 2023-12-22 08:11:00,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=486200.0, ans=0.125 2023-12-22 08:11:04,398 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.416e+01 2.787e+01 2.895e+01 3.084e+01 3.427e+01, threshold=5.790e+01, percent-clipped=0.0 2023-12-22 08:11:10,097 INFO [train.py:886] (3/4) Epoch 16, batch 1450, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4944832.25 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:11:11,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=486266.6666666667, ans=0.125 2023-12-22 08:11:13,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=486266.6666666667, ans=0.125 2023-12-22 08:11:35,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-12-22 08:11:51,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=486533.3333333333, ans=0.125 2023-12-22 08:12:02,650 INFO [train.py:886] (3/4) Epoch 16, batch 1500, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4947857.64 frames. ], batch size: 100, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:12:10,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=486600.0, ans=0.125 2023-12-22 08:12:13,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-12-22 08:12:16,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=486666.6666666667, ans=0.0 2023-12-22 08:12:33,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486800.0, ans=0.125 2023-12-22 08:12:44,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=486866.6666666667, ans=0.0 2023-12-22 08:12:47,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=486866.6666666667, ans=0.125 2023-12-22 08:12:48,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+01 2.830e+01 2.960e+01 3.074e+01 3.564e+01, threshold=5.919e+01, percent-clipped=0.0 2023-12-22 08:12:51,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=486866.6666666667, ans=0.0 2023-12-22 08:12:54,856 INFO [train.py:886] (3/4) Epoch 16, batch 1550, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4951558.33 frames. ], batch size: 99, lr: 6.86e-03, grad_scale: 32.0 2023-12-22 08:13:02,353 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:13:29,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=487133.3333333333, ans=0.125 2023-12-22 08:13:36,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=487200.0, ans=15.0 2023-12-22 08:13:47,234 INFO [train.py:886] (3/4) Epoch 16, batch 1600, loss[loss=0.01458, audio_tagging_loss=0.01458, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4950742.57 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:13:57,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=12.0 2023-12-22 08:14:08,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=487400.0, ans=0.125 2023-12-22 08:14:18,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=487466.6666666667, ans=0.2 2023-12-22 08:14:32,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+01 2.798e+01 2.942e+01 3.066e+01 3.582e+01, threshold=5.884e+01, percent-clipped=0.0 2023-12-22 08:14:38,771 INFO [train.py:886] (3/4) Epoch 16, batch 1650, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4946509.82 frames. ], batch size: 99, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:14:42,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=12.0 2023-12-22 08:15:14,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=487800.0, ans=22.5 2023-12-22 08:15:31,038 INFO [train.py:886] (3/4) Epoch 16, batch 1700, loss[loss=0.01628, audio_tagging_loss=0.01628, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4947878.31 frames. ], batch size: 100, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:15:38,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=487933.3333333333, ans=0.125 2023-12-22 08:15:43,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-12-22 08:15:51,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488066.6666666667, ans=0.1 2023-12-22 08:16:11,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=488200.0, ans=0.2 2023-12-22 08:16:16,267 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.752e+01 2.880e+01 3.018e+01 3.839e+01, threshold=5.760e+01, percent-clipped=0.0 2023-12-22 08:16:22,699 INFO [train.py:886] (3/4) Epoch 16, batch 1750, loss[loss=0.0167, audio_tagging_loss=0.0167, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4951062.82 frames. ], batch size: 100, lr: 6.85e-03, grad_scale: 32.0 2023-12-22 08:16:29,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=488266.6666666667, ans=0.125 2023-12-22 08:16:32,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=488333.3333333333, ans=0.05 2023-12-22 08:16:33,013 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:16:33,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=488333.3333333333, ans=0.125 2023-12-22 08:16:35,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-12-22 08:16:35,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=488333.3333333333, ans=0.125 2023-12-22 08:16:43,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=488400.0, ans=0.0 2023-12-22 08:16:53,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=488466.6666666667, ans=0.125 2023-12-22 08:16:55,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-12-22 08:17:01,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=488466.6666666667, ans=0.0 2023-12-22 08:17:13,905 INFO [train.py:886] (3/4) Epoch 16, batch 1800, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4957307.05 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:17:21,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=12.0 2023-12-22 08:17:29,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=488666.6666666667, ans=0.125 2023-12-22 08:17:44,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=488800.0, ans=0.04949747468305833 2023-12-22 08:17:59,523 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.777e+01 2.946e+01 3.090e+01 3.583e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 08:18:05,222 INFO [train.py:886] (3/4) Epoch 16, batch 1850, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4960602.01 frames. ], batch size: 100, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:18:11,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=488933.3333333333, ans=0.0 2023-12-22 08:18:11,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=488933.3333333333, ans=0.0 2023-12-22 08:18:20,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=489000.0, ans=0.04949747468305833 2023-12-22 08:18:36,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=489133.3333333333, ans=0.0 2023-12-22 08:18:50,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.79 vs. limit=10.0 2023-12-22 08:18:57,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=489266.6666666667, ans=0.04949747468305833 2023-12-22 08:18:57,688 INFO [train.py:886] (3/4) Epoch 16, batch 1900, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4951129.07 frames. ], batch size: 99, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:19:18,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=489400.0, ans=0.125 2023-12-22 08:19:24,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=489400.0, ans=0.125 2023-12-22 08:19:43,099 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.822e+01 2.946e+01 3.143e+01 3.927e+01, threshold=5.893e+01, percent-clipped=0.0 2023-12-22 08:19:46,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=489533.3333333333, ans=0.0 2023-12-22 08:19:49,450 INFO [train.py:886] (3/4) Epoch 16, batch 1950, loss[loss=0.01515, audio_tagging_loss=0.01515, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4944943.08 frames. ], batch size: 99, lr: 6.84e-03, grad_scale: 32.0 2023-12-22 08:20:28,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-12-22 08:20:31,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489866.6666666667, ans=0.1 2023-12-22 08:20:35,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489866.6666666667, ans=0.125 2023-12-22 08:20:37,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489866.6666666667, ans=0.1 2023-12-22 08:20:41,327 INFO [train.py:886] (3/4) Epoch 16, batch 2000, loss[loss=0.01507, audio_tagging_loss=0.01507, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4946840.88 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:20:50,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.59 vs. limit=22.5 2023-12-22 08:21:03,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-22 08:21:13,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=490133.3333333333, ans=0.125 2023-12-22 08:21:26,210 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.757e+01 2.905e+01 3.071e+01 3.430e+01, threshold=5.811e+01, percent-clipped=0.0 2023-12-22 08:21:33,344 INFO [train.py:886] (3/4) Epoch 16, batch 2050, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4948681.66 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:21:40,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=12.0 2023-12-22 08:21:48,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=490333.3333333333, ans=0.125 2023-12-22 08:21:54,254 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 08:21:56,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=490400.0, ans=0.125 2023-12-22 08:22:06,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490466.6666666667, ans=0.1 2023-12-22 08:22:08,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=490466.6666666667, ans=0.0 2023-12-22 08:22:18,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490533.3333333333, ans=0.125 2023-12-22 08:22:22,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=490600.0, ans=0.0 2023-12-22 08:22:23,479 INFO [train.py:886] (3/4) Epoch 16, batch 2100, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4949639.37 frames. ], batch size: 100, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:22:31,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2023-12-22 08:22:52,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=490733.3333333333, ans=0.125 2023-12-22 08:22:54,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=490800.0, ans=0.125 2023-12-22 08:23:09,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.753e+01 2.940e+01 3.068e+01 3.569e+01, threshold=5.880e+01, percent-clipped=0.0 2023-12-22 08:23:16,227 INFO [train.py:886] (3/4) Epoch 16, batch 2150, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4951385.05 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:23:29,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=491000.0, ans=0.0 2023-12-22 08:23:31,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491000.0, ans=0.1 2023-12-22 08:24:00,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.76 vs. limit=10.0 2023-12-22 08:24:07,022 INFO [train.py:886] (3/4) Epoch 16, batch 2200, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4949300.36 frames. ], batch size: 99, lr: 6.83e-03, grad_scale: 64.0 2023-12-22 08:24:17,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=491333.3333333333, ans=0.2 2023-12-22 08:24:28,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=491400.0, ans=0.125 2023-12-22 08:24:34,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-22 08:24:36,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=491400.0, ans=0.125 2023-12-22 08:24:53,267 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+01 2.816e+01 2.901e+01 3.052e+01 3.527e+01, threshold=5.802e+01, percent-clipped=0.0 2023-12-22 08:24:54,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-12-22 08:24:56,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-22 08:24:56,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-22 08:24:57,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=491533.3333333333, ans=22.5 2023-12-22 08:24:58,961 INFO [train.py:886] (3/4) Epoch 16, batch 2250, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4942350.92 frames. ], batch size: 99, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:25:21,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=491733.3333333333, ans=0.125 2023-12-22 08:25:28,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=491733.3333333333, ans=0.125 2023-12-22 08:25:45,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=491866.6666666667, ans=0.0 2023-12-22 08:25:50,136 INFO [train.py:886] (3/4) Epoch 16, batch 2300, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01422, audio_tagging_loss=0.01422, over 4940212.62 frames. ], batch size: 99, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:26:28,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2023-12-22 08:26:30,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=492133.3333333333, ans=0.1 2023-12-22 08:26:34,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=492200.0, ans=0.125 2023-12-22 08:26:35,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 2.719e+01 2.858e+01 3.036e+01 3.603e+01, threshold=5.716e+01, percent-clipped=0.0 2023-12-22 08:26:39,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=492200.0, ans=0.05 2023-12-22 08:26:41,326 INFO [train.py:886] (3/4) Epoch 16, batch 2350, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4947384.59 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:27:06,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=492400.0, ans=0.125 2023-12-22 08:27:06,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=492400.0, ans=0.125 2023-12-22 08:27:30,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=492533.3333333333, ans=0.2 2023-12-22 08:27:34,561 INFO [train.py:886] (3/4) Epoch 16, batch 2400, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4951355.09 frames. ], batch size: 100, lr: 6.82e-03, grad_scale: 64.0 2023-12-22 08:27:47,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=492666.6666666667, ans=0.0 2023-12-22 08:27:58,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=492733.3333333333, ans=0.125 2023-12-22 08:28:08,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=492800.0, ans=0.125 2023-12-22 08:28:13,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=492800.0, ans=0.2 2023-12-22 08:28:18,835 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.788e+01 2.902e+01 3.058e+01 4.154e+01, threshold=5.803e+01, percent-clipped=0.0 2023-12-22 08:28:19,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-12-22 08:28:25,282 INFO [train.py:886] (3/4) Epoch 16, batch 2450, loss[loss=0.01611, audio_tagging_loss=0.01611, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4953530.69 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:28:40,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=493000.0, ans=0.125 2023-12-22 08:28:43,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=493000.0, ans=0.125 2023-12-22 08:28:58,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=493133.3333333333, ans=0.125 2023-12-22 08:29:10,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=493200.0, ans=0.0 2023-12-22 08:29:17,807 INFO [train.py:886] (3/4) Epoch 16, batch 2500, loss[loss=0.01345, audio_tagging_loss=0.01345, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4949979.59 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:29:28,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=493333.3333333333, ans=0.125 2023-12-22 08:29:47,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=493400.0, ans=0.125 2023-12-22 08:29:48,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-12-22 08:29:50,328 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.370e-01 2023-12-22 08:30:03,599 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.509e+01 2.809e+01 2.998e+01 3.086e+01 4.150e+01, threshold=5.995e+01, percent-clipped=0.0 2023-12-22 08:30:09,928 INFO [train.py:886] (3/4) Epoch 16, batch 2550, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01432, audio_tagging_loss=0.01432, over 4943249.41 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:30:30,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.89 vs. limit=22.5 2023-12-22 08:30:31,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493733.3333333333, ans=0.125 2023-12-22 08:30:34,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-12-22 08:31:01,092 INFO [train.py:886] (3/4) Epoch 16, batch 2600, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4941543.73 frames. ], batch size: 99, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:10,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=493933.3333333333, ans=0.125 2023-12-22 08:31:14,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=494000.0, ans=0.2 2023-12-22 08:31:15,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494000.0, ans=0.1 2023-12-22 08:31:25,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=494066.6666666667, ans=0.07 2023-12-22 08:31:33,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=494133.3333333333, ans=0.05 2023-12-22 08:31:43,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=494200.0, ans=0.2 2023-12-22 08:31:44,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=494200.0, ans=0.0 2023-12-22 08:31:46,508 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+01 2.802e+01 2.926e+01 3.072e+01 4.080e+01, threshold=5.852e+01, percent-clipped=0.0 2023-12-22 08:31:52,131 INFO [train.py:886] (3/4) Epoch 16, batch 2650, loss[loss=0.01609, audio_tagging_loss=0.01609, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4944271.57 frames. ], batch size: 100, lr: 6.81e-03, grad_scale: 64.0 2023-12-22 08:31:55,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-22 08:32:19,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-12-22 08:32:20,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2023-12-22 08:32:21,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=12.0 2023-12-22 08:32:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=494533.3333333333, ans=0.125 2023-12-22 08:32:44,384 INFO [train.py:886] (3/4) Epoch 16, batch 2700, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4945851.84 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:32:56,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=494666.6666666667, ans=0.0 2023-12-22 08:32:57,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=494666.6666666667, ans=0.05 2023-12-22 08:33:09,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-22 08:33:12,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=494733.3333333333, ans=0.125 2023-12-22 08:33:26,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=494866.6666666667, ans=0.125 2023-12-22 08:33:29,567 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.769e+01 2.925e+01 3.079e+01 4.143e+01, threshold=5.850e+01, percent-clipped=0.0 2023-12-22 08:33:35,986 INFO [train.py:886] (3/4) Epoch 16, batch 2750, loss[loss=0.01326, audio_tagging_loss=0.01326, over 24036.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4943770.21 frames. ], batch size: 100, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:33:41,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=494933.3333333333, ans=0.2 2023-12-22 08:34:28,191 INFO [train.py:886] (3/4) Epoch 16, batch 2800, loss[loss=0.01666, audio_tagging_loss=0.01666, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4946563.21 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:34:33,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=495266.6666666667, ans=0.0 2023-12-22 08:34:37,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-22 08:35:13,350 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+01 2.809e+01 2.977e+01 3.124e+01 3.459e+01, threshold=5.955e+01, percent-clipped=0.0 2023-12-22 08:35:19,686 INFO [train.py:886] (3/4) Epoch 16, batch 2850, loss[loss=0.01463, audio_tagging_loss=0.01463, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4942343.68 frames. ], batch size: 99, lr: 6.80e-03, grad_scale: 64.0 2023-12-22 08:35:19,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=495600.0, ans=0.2 2023-12-22 08:35:29,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=495666.6666666667, ans=0.2 2023-12-22 08:35:37,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=495666.6666666667, ans=0.125 2023-12-22 08:35:42,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495733.3333333333, ans=0.125 2023-12-22 08:35:49,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495733.3333333333, ans=0.1 2023-12-22 08:36:00,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-12-22 08:36:11,119 INFO [train.py:886] (3/4) Epoch 16, batch 2900, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4943624.40 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:36:17,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=495933.3333333333, ans=0.0 2023-12-22 08:36:30,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=496000.0, ans=0.0 2023-12-22 08:36:51,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=496200.0, ans=0.125 2023-12-22 08:36:54,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=15.0 2023-12-22 08:36:56,680 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.420e+01 2.734e+01 2.897e+01 3.006e+01 3.535e+01, threshold=5.795e+01, percent-clipped=0.0 2023-12-22 08:37:03,072 INFO [train.py:886] (3/4) Epoch 16, batch 2950, loss[loss=0.01453, audio_tagging_loss=0.01453, over 24750.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4948191.15 frames. ], batch size: 99, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:06,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=496266.6666666667, ans=0.0 2023-12-22 08:37:11,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=496333.3333333333, ans=0.125 2023-12-22 08:37:30,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496400.0, ans=0.1 2023-12-22 08:37:35,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=496466.6666666667, ans=0.125 2023-12-22 08:37:35,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=15.0 2023-12-22 08:37:36,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=496466.6666666667, ans=0.125 2023-12-22 08:37:40,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496466.6666666667, ans=0.1 2023-12-22 08:37:52,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.82 vs. limit=6.0 2023-12-22 08:37:54,008 INFO [train.py:886] (3/4) Epoch 16, batch 3000, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4949648.93 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:37:54,008 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 08:38:14,832 INFO [train.py:917] (3/4) Epoch 16, validation: loss=0.0344, audio_tagging_loss=0.0344, over 3737520.00 frames. 2023-12-22 08:38:14,833 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 08:38:58,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-12-22 08:39:01,432 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.766e+01 2.892e+01 3.038e+01 3.392e+01, threshold=5.783e+01, percent-clipped=0.0 2023-12-22 08:39:07,794 INFO [train.py:886] (3/4) Epoch 16, batch 3050, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4948891.20 frames. ], batch size: 100, lr: 6.79e-03, grad_scale: 64.0 2023-12-22 08:39:09,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=496933.3333333333, ans=0.125 2023-12-22 08:39:19,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=497000.0, ans=0.0 2023-12-22 08:39:24,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-12-22 08:39:58,691 INFO [train.py:886] (3/4) Epoch 16, batch 3100, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4952194.25 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:40:00,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=497266.6666666667, ans=0.2 2023-12-22 08:40:04,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=497266.6666666667, ans=0.125 2023-12-22 08:40:14,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-12-22 08:40:18,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=22.5 2023-12-22 08:40:25,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2023-12-22 08:40:45,392 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.521e+01 2.809e+01 2.944e+01 3.089e+01 3.657e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 08:40:46,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=497533.3333333333, ans=0.125 2023-12-22 08:40:49,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=497533.3333333333, ans=0.0 2023-12-22 08:40:51,052 INFO [train.py:886] (3/4) Epoch 16, batch 3150, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4948852.00 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:41:09,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-12-22 08:41:21,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2023-12-22 08:41:29,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=497800.0, ans=0.2 2023-12-22 08:41:30,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=497866.6666666667, ans=0.125 2023-12-22 08:41:31,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=497866.6666666667, ans=0.125 2023-12-22 08:41:42,565 INFO [train.py:886] (3/4) Epoch 16, batch 3200, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4941395.94 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:41:52,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=498000.0, ans=0.0 2023-12-22 08:42:11,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=12.0 2023-12-22 08:42:18,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-12-22 08:42:27,664 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.733e+01 2.858e+01 3.051e+01 3.454e+01, threshold=5.717e+01, percent-clipped=0.0 2023-12-22 08:42:33,350 INFO [train.py:886] (3/4) Epoch 16, batch 3250, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4945405.03 frames. ], batch size: 99, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:43:05,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=498466.6666666667, ans=0.125 2023-12-22 08:43:06,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2023-12-22 08:43:06,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=498466.6666666667, ans=10.0 2023-12-22 08:43:10,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=498466.6666666667, ans=0.125 2023-12-22 08:43:24,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=498533.3333333333, ans=0.125 2023-12-22 08:43:26,556 INFO [train.py:886] (3/4) Epoch 16, batch 3300, loss[loss=0.01581, audio_tagging_loss=0.01581, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4952495.47 frames. ], batch size: 100, lr: 6.78e-03, grad_scale: 64.0 2023-12-22 08:43:33,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=498600.0, ans=0.95 2023-12-22 08:44:05,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=498800.0, ans=0.025 2023-12-22 08:44:05,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=498800.0, ans=0.125 2023-12-22 08:44:11,335 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.738e+01 2.869e+01 3.079e+01 3.471e+01, threshold=5.738e+01, percent-clipped=0.0 2023-12-22 08:44:17,634 INFO [train.py:886] (3/4) Epoch 16, batch 3350, loss[loss=0.01883, audio_tagging_loss=0.01883, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4947163.16 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:45:05,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=499200.0, ans=0.125 2023-12-22 08:45:08,420 INFO [train.py:886] (3/4) Epoch 16, batch 3400, loss[loss=0.01578, audio_tagging_loss=0.01578, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4953250.69 frames. ], batch size: 100, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:45:09,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=499266.6666666667, ans=0.0 2023-12-22 08:45:11,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=499266.6666666667, ans=0.2 2023-12-22 08:45:18,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.83 vs. limit=22.5 2023-12-22 08:45:24,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-22 08:45:44,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=499466.6666666667, ans=0.0 2023-12-22 08:45:51,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=499533.3333333333, ans=0.025 2023-12-22 08:45:53,378 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.800e+01 2.974e+01 3.102e+01 3.785e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 08:45:59,703 INFO [train.py:886] (3/4) Epoch 16, batch 3450, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4946077.17 frames. ], batch size: 99, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:46:51,487 INFO [train.py:886] (3/4) Epoch 16, batch 3500, loss[loss=0.01589, audio_tagging_loss=0.01589, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4944072.72 frames. ], batch size: 99, lr: 6.77e-03, grad_scale: 64.0 2023-12-22 08:46:54,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=499933.3333333333, ans=0.125 2023-12-22 08:47:02,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=500000.0, ans=0.2 2023-12-22 08:47:19,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=500066.6666666667, ans=0.95 2023-12-22 08:47:20,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2023-12-22 08:47:38,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.826e+01 2.952e+01 3.104e+01 3.647e+01, threshold=5.905e+01, percent-clipped=0.0 2023-12-22 08:47:44,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=500266.6666666667, ans=0.2 2023-12-22 08:47:45,461 INFO [train.py:886] (3/4) Epoch 16, batch 3550, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4944172.84 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:48:10,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=500400.0, ans=0.0 2023-12-22 08:48:11,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-22 08:48:14,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-12-22 08:48:22,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-12-22 08:48:33,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=500533.3333333333, ans=0.125 2023-12-22 08:48:36,971 INFO [train.py:886] (3/4) Epoch 16, batch 3600, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4948507.50 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:48:38,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=500600.0, ans=0.0 2023-12-22 08:48:51,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2023-12-22 08:48:55,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=500666.6666666667, ans=0.035 2023-12-22 08:49:20,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=500866.6666666667, ans=0.125 2023-12-22 08:49:21,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=500866.6666666667, ans=0.04949747468305833 2023-12-22 08:49:22,451 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.766e+01 2.870e+01 3.088e+01 3.484e+01, threshold=5.741e+01, percent-clipped=0.0 2023-12-22 08:49:28,813 INFO [train.py:886] (3/4) Epoch 16, batch 3650, loss[loss=0.01468, audio_tagging_loss=0.01468, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4953251.30 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:49:38,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501000.0, ans=0.0 2023-12-22 08:49:48,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=501000.0, ans=0.2 2023-12-22 08:50:00,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=501133.3333333333, ans=22.5 2023-12-22 08:50:01,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2023-12-22 08:50:12,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2023-12-22 08:50:15,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=501200.0, ans=0.125 2023-12-22 08:50:15,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=501200.0, ans=0.2 2023-12-22 08:50:21,115 INFO [train.py:886] (3/4) Epoch 16, batch 3700, loss[loss=0.01443, audio_tagging_loss=0.01443, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4956729.78 frames. ], batch size: 100, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:50:21,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-12-22 08:50:45,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=501400.0, ans=0.035 2023-12-22 08:50:49,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-12-22 08:51:06,602 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.832e+01 2.951e+01 3.083e+01 3.444e+01, threshold=5.901e+01, percent-clipped=0.0 2023-12-22 08:51:07,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=501533.3333333333, ans=0.2 2023-12-22 08:51:13,038 INFO [train.py:886] (3/4) Epoch 16, batch 3750, loss[loss=0.01616, audio_tagging_loss=0.01616, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4951055.26 frames. ], batch size: 99, lr: 6.76e-03, grad_scale: 64.0 2023-12-22 08:51:34,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=501733.3333333333, ans=0.05 2023-12-22 08:51:49,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2023-12-22 08:52:04,565 INFO [train.py:886] (3/4) Epoch 16, batch 3800, loss[loss=0.01712, audio_tagging_loss=0.01712, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4941447.01 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:52:18,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-12-22 08:52:20,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-22 08:52:23,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 08:52:33,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502066.6666666667, ans=0.125 2023-12-22 08:52:33,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=502066.6666666667, ans=0.125 2023-12-22 08:52:46,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=502200.0, ans=0.125 2023-12-22 08:52:50,284 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.369e+01 2.815e+01 2.951e+01 3.109e+01 3.599e+01, threshold=5.902e+01, percent-clipped=0.0 2023-12-22 08:52:51,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2023-12-22 08:52:54,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=502200.0, ans=0.0 2023-12-22 08:52:56,719 INFO [train.py:886] (3/4) Epoch 16, batch 3850, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4943663.27 frames. ], batch size: 99, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:53:11,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=15.0 2023-12-22 08:53:20,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=502400.0, ans=0.2 2023-12-22 08:53:23,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=502400.0, ans=0.125 2023-12-22 08:53:23,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2023-12-22 08:53:37,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=502533.3333333333, ans=0.0 2023-12-22 08:53:40,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.87 vs. limit=22.5 2023-12-22 08:53:43,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=502533.3333333333, ans=0.125 2023-12-22 08:53:48,104 INFO [train.py:886] (3/4) Epoch 16, batch 3900, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4945413.19 frames. ], batch size: 99, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:54:34,024 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.745e+01 2.917e+01 3.077e+01 3.660e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 08:54:38,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=502866.6666666667, ans=0.125 2023-12-22 08:54:40,366 INFO [train.py:886] (3/4) Epoch 16, batch 3950, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4946536.92 frames. ], batch size: 100, lr: 6.75e-03, grad_scale: 64.0 2023-12-22 08:54:43,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502933.3333333333, ans=0.1 2023-12-22 08:55:09,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.96 vs. limit=15.0 2023-12-22 08:55:24,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=503200.0, ans=0.0 2023-12-22 08:55:29,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503200.0, ans=0.1 2023-12-22 08:55:29,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=503200.0, ans=0.0 2023-12-22 08:55:31,472 INFO [train.py:886] (3/4) Epoch 16, batch 4000, loss[loss=0.01733, audio_tagging_loss=0.01733, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4954867.91 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 128.0 2023-12-22 08:56:00,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-22 08:56:02,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=503466.6666666667, ans=0.2 2023-12-22 08:56:05,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 08:56:09,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=503466.6666666667, ans=0.0 2023-12-22 08:56:10,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=503466.6666666667, ans=0.0 2023-12-22 08:56:10,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=503466.6666666667, ans=0.0 2023-12-22 08:56:17,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=503533.3333333333, ans=15.0 2023-12-22 08:56:18,186 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.822e+01 2.939e+01 3.066e+01 3.812e+01, threshold=5.877e+01, percent-clipped=0.0 2023-12-22 08:56:18,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=503533.3333333333, ans=0.0 2023-12-22 08:56:21,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=503533.3333333333, ans=0.0 2023-12-22 08:56:22,957 INFO [train.py:886] (3/4) Epoch 16, batch 4050, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4954185.88 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:56:41,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503666.6666666667, ans=0.1 2023-12-22 08:56:49,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-22 08:56:51,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=503733.3333333333, ans=0.0 2023-12-22 08:56:54,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=503800.0, ans=0.125 2023-12-22 08:57:08,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-12-22 08:57:15,710 INFO [train.py:886] (3/4) Epoch 16, batch 4100, loss[loss=0.01101, audio_tagging_loss=0.01101, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4950575.94 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:57:16,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=503933.3333333333, ans=10.0 2023-12-22 08:57:17,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=12.0 2023-12-22 08:57:20,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=503933.3333333333, ans=0.0 2023-12-22 08:57:23,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=503933.3333333333, ans=0.02 2023-12-22 08:57:29,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=504000.0, ans=0.0 2023-12-22 08:57:31,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=504000.0, ans=0.125 2023-12-22 08:57:39,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=504066.6666666667, ans=0.125 2023-12-22 08:57:41,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=504066.6666666667, ans=0.2 2023-12-22 08:57:46,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=504133.3333333333, ans=0.0 2023-12-22 08:57:55,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=504200.0, ans=0.125 2023-12-22 08:58:01,237 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.833e+01 2.971e+01 3.147e+01 3.748e+01, threshold=5.941e+01, percent-clipped=0.0 2023-12-22 08:58:02,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=504200.0, ans=0.2 2023-12-22 08:58:06,725 INFO [train.py:886] (3/4) Epoch 16, batch 4150, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01426, audio_tagging_loss=0.01426, over 4943228.45 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:58:25,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=504333.3333333333, ans=0.125 2023-12-22 08:58:40,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=504466.6666666667, ans=0.125 2023-12-22 08:58:50,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504533.3333333333, ans=0.1 2023-12-22 08:58:52,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=504533.3333333333, ans=0.125 2023-12-22 08:58:57,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=12.0 2023-12-22 08:58:58,051 INFO [train.py:886] (3/4) Epoch 16, batch 4200, loss[loss=0.01439, audio_tagging_loss=0.01439, over 25000.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4939499.98 frames. ], batch size: 100, lr: 6.74e-03, grad_scale: 64.0 2023-12-22 08:59:02,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=504600.0, ans=0.0 2023-12-22 08:59:11,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=504666.6666666667, ans=0.125 2023-12-22 08:59:25,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-12-22 08:59:43,948 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.796e+01 2.928e+01 3.041e+01 3.614e+01, threshold=5.855e+01, percent-clipped=0.0 2023-12-22 08:59:48,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=504933.3333333333, ans=0.125 2023-12-22 08:59:49,380 INFO [train.py:886] (3/4) Epoch 16, batch 4250, loss[loss=0.01513, audio_tagging_loss=0.01513, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4950046.70 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:00:05,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-12-22 09:00:09,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-12-22 09:00:40,926 INFO [train.py:886] (3/4) Epoch 16, batch 4300, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4955751.27 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:00:52,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=505333.3333333333, ans=0.0 2023-12-22 09:01:01,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=505333.3333333333, ans=0.0 2023-12-22 09:01:04,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=505400.0, ans=0.2 2023-12-22 09:01:06,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505400.0, ans=0.125 2023-12-22 09:01:15,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505466.6666666667, ans=0.125 2023-12-22 09:01:27,808 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+01 2.822e+01 2.946e+01 3.100e+01 3.669e+01, threshold=5.892e+01, percent-clipped=0.0 2023-12-22 09:01:31,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=505533.3333333333, ans=0.0 2023-12-22 09:01:33,248 INFO [train.py:886] (3/4) Epoch 16, batch 4350, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4961257.33 frames. ], batch size: 100, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:01:39,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505600.0, ans=0.1 2023-12-22 09:01:41,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=505600.0, ans=0.0 2023-12-22 09:01:51,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=505666.6666666667, ans=0.125 2023-12-22 09:02:14,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-12-22 09:02:23,184 INFO [train.py:886] (3/4) Epoch 16, batch 4400, loss[loss=0.01604, audio_tagging_loss=0.01604, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 4958028.12 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 64.0 2023-12-22 09:02:24,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=505933.3333333333, ans=0.125 2023-12-22 09:02:26,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=505933.3333333333, ans=0.0 2023-12-22 09:02:28,831 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:02:29,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=505933.3333333333, ans=0.0 2023-12-22 09:02:43,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506066.6666666667, ans=0.1 2023-12-22 09:02:58,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=506133.3333333333, ans=0.0 2023-12-22 09:03:02,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=506133.3333333333, ans=0.0 2023-12-22 09:03:10,654 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 2.860e+01 2.974e+01 3.134e+01 3.649e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 09:03:14,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=506200.0, ans=15.0 2023-12-22 09:03:15,412 INFO [train.py:886] (3/4) Epoch 16, batch 4450, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4948701.19 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:03:24,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=506333.3333333333, ans=0.125 2023-12-22 09:03:25,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2023-12-22 09:03:27,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=506333.3333333333, ans=0.125 2023-12-22 09:03:51,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=506466.6666666667, ans=0.0 2023-12-22 09:03:57,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=506533.3333333333, ans=0.04949747468305833 2023-12-22 09:04:06,914 INFO [train.py:886] (3/4) Epoch 16, batch 4500, loss[loss=0.01306, audio_tagging_loss=0.01306, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4949980.95 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:04:23,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=506666.6666666667, ans=0.0 2023-12-22 09:04:37,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-12-22 09:04:55,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.756e+01 2.905e+01 3.063e+01 3.487e+01, threshold=5.810e+01, percent-clipped=0.0 2023-12-22 09:05:00,645 INFO [train.py:886] (3/4) Epoch 16, batch 4550, loss[loss=0.01602, audio_tagging_loss=0.01602, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4955244.96 frames. ], batch size: 99, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:05:05,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=506933.3333333333, ans=0.125 2023-12-22 09:05:29,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507066.6666666667, ans=0.1 2023-12-22 09:05:45,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-12-22 09:05:49,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=22.5 2023-12-22 09:05:53,164 INFO [train.py:886] (3/4) Epoch 16, batch 4600, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4962764.55 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:05:53,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2023-12-22 09:05:57,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-12-22 09:06:00,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=507266.6666666667, ans=0.125 2023-12-22 09:06:13,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=507400.0, ans=0.125 2023-12-22 09:06:29,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=507466.6666666667, ans=0.2 2023-12-22 09:06:38,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507533.3333333333, ans=0.1 2023-12-22 09:06:39,325 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.771e+01 2.902e+01 3.112e+01 3.389e+01, threshold=5.805e+01, percent-clipped=0.0 2023-12-22 09:06:44,725 INFO [train.py:886] (3/4) Epoch 16, batch 4650, loss[loss=0.01568, audio_tagging_loss=0.01568, over 25000.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4966783.28 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 64.0 2023-12-22 09:07:02,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507666.6666666667, ans=0.1 2023-12-22 09:07:05,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=22.5 2023-12-22 09:07:19,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=507800.0, ans=0.125 2023-12-22 09:07:25,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507866.6666666667, ans=0.125 2023-12-22 09:07:36,007 INFO [train.py:886] (3/4) Epoch 16, batch 4700, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01436, audio_tagging_loss=0.01436, over 4964347.26 frames. ], batch size: 99, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:07:37,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-12-22 09:07:40,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=507933.3333333333, ans=0.2 2023-12-22 09:07:43,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=507933.3333333333, ans=0.125 2023-12-22 09:07:50,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=508000.0, ans=0.125 2023-12-22 09:08:00,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=508066.6666666667, ans=0.125 2023-12-22 09:08:09,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=508133.3333333333, ans=0.125 2023-12-22 09:08:18,087 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.851e+01 3.014e+01 3.171e+01 4.011e+01, threshold=6.029e+01, percent-clipped=0.0 2023-12-22 09:08:23,112 INFO [train.py:886] (3/4) Epoch 16, batch 4750, loss[loss=0.01681, audio_tagging_loss=0.01681, over 24750.00 frames. ], tot_loss[loss=0.01443, audio_tagging_loss=0.01443, over 4957938.71 frames. ], batch size: 99, lr: 6.71e-03, grad_scale: 64.0 2023-12-22 09:08:29,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-12-22 09:08:30,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508266.6666666667, ans=0.1 2023-12-22 09:08:59,134 INFO [train.py:886] (3/4) Epoch 17, batch 0, loss[loss=0.03065, audio_tagging_loss=0.03065, over 25000.00 frames. ], tot_loss[loss=0.03065, audio_tagging_loss=0.03065, over 25000.00 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:08:59,135 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 09:09:08,274 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2159, 3.6978, 3.6111, 3.4382], device='cuda:3') 2023-12-22 09:09:20,284 INFO [train.py:917] (3/4) Epoch 17, validation: loss=0.03195, audio_tagging_loss=0.03195, over 3737520.00 frames. 2023-12-22 09:09:20,285 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 09:09:33,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=508440.0, ans=0.125 2023-12-22 09:09:34,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-12-22 09:09:40,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=508506.6666666667, ans=0.05 2023-12-22 09:09:40,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=508506.6666666667, ans=0.125 2023-12-22 09:09:47,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2023-12-22 09:10:00,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=508640.0, ans=0.0 2023-12-22 09:10:10,894 INFO [train.py:886] (3/4) Epoch 17, batch 50, loss[loss=0.01643, audio_tagging_loss=0.01643, over 25000.00 frames. ], tot_loss[loss=0.02209, audio_tagging_loss=0.02209, over 1121432.91 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 64.0 2023-12-22 09:10:12,052 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:10:22,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=508773.3333333333, ans=0.2 2023-12-22 09:10:29,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-22 09:10:30,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508773.3333333333, ans=0.1 2023-12-22 09:10:35,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=508840.0, ans=10.0 2023-12-22 09:10:41,307 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.315e+01 3.576e+01 4.102e+01 9.303e+01, threshold=7.152e+01, percent-clipped=8.0 2023-12-22 09:10:41,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=508906.6666666667, ans=0.125 2023-12-22 09:10:47,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2023-12-22 09:10:50,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=508906.6666666667, ans=0.2 2023-12-22 09:11:01,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2023-12-22 09:11:03,183 INFO [train.py:886] (3/4) Epoch 17, batch 100, loss[loss=0.01767, audio_tagging_loss=0.01767, over 25000.00 frames. ], tot_loss[loss=0.01932, audio_tagging_loss=0.01932, over 1974751.27 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:11:17,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=509106.6666666667, ans=0.07 2023-12-22 09:11:48,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=509306.6666666667, ans=0.125 2023-12-22 09:11:54,541 INFO [train.py:886] (3/4) Epoch 17, batch 150, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.0175, audio_tagging_loss=0.0175, over 2639654.44 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:11:56,594 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:12:04,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=509440.0, ans=0.125 2023-12-22 09:12:24,686 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.920e+01 3.033e+01 3.237e+01 3.873e+01, threshold=6.065e+01, percent-clipped=0.0 2023-12-22 09:12:40,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=509640.0, ans=0.0 2023-12-22 09:12:46,838 INFO [train.py:886] (3/4) Epoch 17, batch 200, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24017.00 frames. ], tot_loss[loss=0.01645, audio_tagging_loss=0.01645, over 3157466.67 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:12:46,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=509706.6666666667, ans=0.125 2023-12-22 09:13:01,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=509773.3333333333, ans=0.125 2023-12-22 09:13:02,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=509773.3333333333, ans=0.5 2023-12-22 09:13:13,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-12-22 09:13:26,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=509906.6666666667, ans=0.0 2023-12-22 09:13:28,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=509973.3333333333, ans=0.0 2023-12-22 09:13:31,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-12-22 09:13:39,426 INFO [train.py:886] (3/4) Epoch 17, batch 250, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01575, audio_tagging_loss=0.01575, over 3558932.62 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:13:44,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=510040.0, ans=0.125 2023-12-22 09:14:08,572 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.772e+01 2.917e+01 3.041e+01 3.552e+01, threshold=5.833e+01, percent-clipped=0.0 2023-12-22 09:14:25,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=510306.6666666667, ans=0.0 2023-12-22 09:14:30,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=15.0 2023-12-22 09:14:30,815 INFO [train.py:886] (3/4) Epoch 17, batch 300, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24041.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 3863632.47 frames. ], batch size: 100, lr: 6.50e-03, grad_scale: 64.0 2023-12-22 09:14:43,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=510440.0, ans=0.04949747468305833 2023-12-22 09:15:23,883 INFO [train.py:886] (3/4) Epoch 17, batch 350, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.0151, audio_tagging_loss=0.0151, over 4099636.62 frames. ], batch size: 99, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:15:38,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=510773.3333333333, ans=0.125 2023-12-22 09:15:54,266 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.827e+01 3.010e+01 3.113e+01 3.707e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 09:16:09,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=510973.3333333333, ans=0.0 2023-12-22 09:16:15,124 INFO [train.py:886] (3/4) Epoch 17, batch 400, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 4286653.12 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:16:18,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511040.0, ans=0.1 2023-12-22 09:16:26,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=511106.6666666667, ans=0.0 2023-12-22 09:16:34,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=511106.6666666667, ans=0.2 2023-12-22 09:16:38,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=511173.3333333333, ans=0.0 2023-12-22 09:16:51,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=511240.0, ans=0.2 2023-12-22 09:17:05,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=511306.6666666667, ans=0.125 2023-12-22 09:17:07,483 INFO [train.py:886] (3/4) Epoch 17, batch 450, loss[loss=0.01564, audio_tagging_loss=0.01564, over 22213.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 4433571.44 frames. ], batch size: 107, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:17:10,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=511373.3333333333, ans=0.125 2023-12-22 09:17:20,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=511440.0, ans=0.125 2023-12-22 09:17:38,006 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.481e+01 2.743e+01 2.910e+01 3.047e+01 3.599e+01, threshold=5.820e+01, percent-clipped=0.0 2023-12-22 09:18:00,486 INFO [train.py:886] (3/4) Epoch 17, batch 500, loss[loss=0.01009, audio_tagging_loss=0.01009, over 22243.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4550131.79 frames. ], batch size: 107, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:18:08,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=511706.6666666667, ans=0.0 2023-12-22 09:18:12,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-12-22 09:18:12,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=511773.3333333333, ans=0.125 2023-12-22 09:18:14,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=511773.3333333333, ans=0.125 2023-12-22 09:18:25,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=511840.0, ans=0.125 2023-12-22 09:18:27,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=511840.0, ans=0.125 2023-12-22 09:18:36,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2023-12-22 09:18:52,175 INFO [train.py:886] (3/4) Epoch 17, batch 550, loss[loss=0.01596, audio_tagging_loss=0.01596, over 25000.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4637156.57 frames. ], batch size: 100, lr: 6.49e-03, grad_scale: 64.0 2023-12-22 09:18:57,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=512040.0, ans=0.125 2023-12-22 09:19:04,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512106.6666666667, ans=0.1 2023-12-22 09:19:14,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=512173.3333333333, ans=0.0 2023-12-22 09:19:15,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=512173.3333333333, ans=6.0 2023-12-22 09:19:20,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=512173.3333333333, ans=0.95 2023-12-22 09:19:22,631 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.788e+01 2.954e+01 3.144e+01 3.570e+01, threshold=5.909e+01, percent-clipped=0.0 2023-12-22 09:19:31,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.03 vs. limit=22.5 2023-12-22 09:19:44,633 INFO [train.py:886] (3/4) Epoch 17, batch 600, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01429, audio_tagging_loss=0.01429, over 4709109.53 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:20:00,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-12-22 09:20:02,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=12.0 2023-12-22 09:20:03,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=512440.0, ans=0.2 2023-12-22 09:20:05,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=512506.6666666667, ans=0.125 2023-12-22 09:20:16,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=512573.3333333333, ans=0.125 2023-12-22 09:20:36,476 INFO [train.py:886] (3/4) Epoch 17, batch 650, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4763629.57 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:20:43,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=512706.6666666667, ans=0.0 2023-12-22 09:20:49,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=512773.3333333333, ans=0.125 2023-12-22 09:21:04,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 2.843e+01 2.927e+01 3.090e+01 3.715e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:21:20,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512973.3333333333, ans=0.0 2023-12-22 09:21:27,344 INFO [train.py:886] (3/4) Epoch 17, batch 700, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24750.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4800183.57 frames. ], batch size: 99, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:21:43,263 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.571e-03 2023-12-22 09:21:43,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-12-22 09:21:52,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2023-12-22 09:21:54,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-12-22 09:22:03,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=513240.0, ans=0.125 2023-12-22 09:22:06,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-12-22 09:22:19,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-12-22 09:22:19,886 INFO [train.py:886] (3/4) Epoch 17, batch 750, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4835421.88 frames. ], batch size: 100, lr: 6.48e-03, grad_scale: 64.0 2023-12-22 09:22:24,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513373.3333333333, ans=0.1 2023-12-22 09:22:27,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2023-12-22 09:22:50,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 2.839e+01 2.963e+01 3.097e+01 3.616e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 09:23:01,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=513640.0, ans=0.0 2023-12-22 09:23:10,508 INFO [train.py:886] (3/4) Epoch 17, batch 800, loss[loss=0.01506, audio_tagging_loss=0.01506, over 24750.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4864325.82 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:23:25,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513773.3333333333, ans=0.1 2023-12-22 09:23:36,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=513840.0, ans=0.2 2023-12-22 09:23:38,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513840.0, ans=0.1 2023-12-22 09:23:54,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2023-12-22 09:24:01,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=513973.3333333333, ans=0.125 2023-12-22 09:24:03,865 INFO [train.py:886] (3/4) Epoch 17, batch 850, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4886505.53 frames. ], batch size: 100, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:24:08,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2023-12-22 09:24:09,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=514040.0, ans=0.125 2023-12-22 09:24:11,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-22 09:24:17,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=514106.6666666667, ans=0.125 2023-12-22 09:24:21,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=514106.6666666667, ans=0.0 2023-12-22 09:24:27,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.49 vs. limit=22.5 2023-12-22 09:24:34,347 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 2.760e+01 2.887e+01 3.062e+01 3.648e+01, threshold=5.774e+01, percent-clipped=0.0 2023-12-22 09:24:46,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=514306.6666666667, ans=0.1 2023-12-22 09:24:49,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=514306.6666666667, ans=0.125 2023-12-22 09:24:50,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514306.6666666667, ans=0.125 2023-12-22 09:24:55,668 INFO [train.py:886] (3/4) Epoch 17, batch 900, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4899331.65 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:24:59,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514373.3333333333, ans=0.1 2023-12-22 09:25:01,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514373.3333333333, ans=0.125 2023-12-22 09:25:11,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=514440.0, ans=0.05 2023-12-22 09:25:16,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=514506.6666666667, ans=0.0 2023-12-22 09:25:17,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=514506.6666666667, ans=0.0 2023-12-22 09:25:29,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=514573.3333333333, ans=0.125 2023-12-22 09:25:45,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-22 09:25:47,436 INFO [train.py:886] (3/4) Epoch 17, batch 950, loss[loss=0.0178, audio_tagging_loss=0.0178, over 24750.00 frames. ], tot_loss[loss=0.01415, audio_tagging_loss=0.01415, over 4908050.29 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:25:49,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=514706.6666666667, ans=0.125 2023-12-22 09:25:50,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=514706.6666666667, ans=0.1 2023-12-22 09:25:58,599 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:26:14,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=514840.0, ans=0.0 2023-12-22 09:26:18,090 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.614e+01 2.797e+01 2.941e+01 3.077e+01 3.617e+01, threshold=5.883e+01, percent-clipped=0.0 2023-12-22 09:26:41,021 INFO [train.py:886] (3/4) Epoch 17, batch 1000, loss[loss=0.01553, audio_tagging_loss=0.01553, over 24750.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4914086.39 frames. ], batch size: 99, lr: 6.47e-03, grad_scale: 64.0 2023-12-22 09:26:42,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515040.0, ans=0.1 2023-12-22 09:26:51,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515106.6666666667, ans=0.125 2023-12-22 09:26:56,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515106.6666666667, ans=0.125 2023-12-22 09:27:00,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=515173.3333333333, ans=0.0 2023-12-22 09:27:00,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515173.3333333333, ans=0.1 2023-12-22 09:27:12,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515240.0, ans=0.125 2023-12-22 09:27:20,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=515306.6666666667, ans=0.125 2023-12-22 09:27:21,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=515306.6666666667, ans=0.0 2023-12-22 09:27:23,655 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:27:31,108 INFO [train.py:886] (3/4) Epoch 17, batch 1050, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4920253.18 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:27:43,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=515440.0, ans=0.125 2023-12-22 09:28:01,248 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.471e+01 2.759e+01 2.948e+01 3.113e+01 4.038e+01, threshold=5.895e+01, percent-clipped=0.0 2023-12-22 09:28:06,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-12-22 09:28:07,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=515573.3333333333, ans=0.0 2023-12-22 09:28:16,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=515640.0, ans=0.0 2023-12-22 09:28:24,297 INFO [train.py:886] (3/4) Epoch 17, batch 1100, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4925858.83 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:28:31,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=515706.6666666667, ans=0.125 2023-12-22 09:28:45,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=515840.0, ans=0.2 2023-12-22 09:28:57,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=515906.6666666667, ans=0.0 2023-12-22 09:29:06,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-22 09:29:06,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=515973.3333333333, ans=0.125 2023-12-22 09:29:17,770 INFO [train.py:886] (3/4) Epoch 17, batch 1150, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4933878.68 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:29:18,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=516040.0, ans=0.2 2023-12-22 09:29:23,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=516040.0, ans=0.125 2023-12-22 09:29:28,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=516106.6666666667, ans=0.0 2023-12-22 09:29:34,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=516106.6666666667, ans=0.125 2023-12-22 09:29:42,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=516173.3333333333, ans=0.0 2023-12-22 09:29:47,300 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.790e+01 2.920e+01 3.045e+01 3.393e+01, threshold=5.839e+01, percent-clipped=0.0 2023-12-22 09:29:52,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=516240.0, ans=0.1 2023-12-22 09:29:54,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=516240.0, ans=0.125 2023-12-22 09:29:56,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=516240.0, ans=0.125 2023-12-22 09:30:08,763 INFO [train.py:886] (3/4) Epoch 17, batch 1200, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4941459.94 frames. ], batch size: 100, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:30:25,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=516440.0, ans=10.0 2023-12-22 09:30:28,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=516440.0, ans=0.125 2023-12-22 09:30:35,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2023-12-22 09:30:36,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=516506.6666666667, ans=0.125 2023-12-22 09:30:38,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=516506.6666666667, ans=0.0 2023-12-22 09:30:48,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=516573.3333333333, ans=0.04949747468305833 2023-12-22 09:31:01,203 INFO [train.py:886] (3/4) Epoch 17, batch 1250, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4942519.95 frames. ], batch size: 99, lr: 6.46e-03, grad_scale: 64.0 2023-12-22 09:31:03,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-22 09:31:19,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=516773.3333333333, ans=0.125 2023-12-22 09:31:31,598 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.981e+01 3.113e+01 3.868e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 09:31:34,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-22 09:31:42,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=516973.3333333333, ans=0.95 2023-12-22 09:31:47,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=516973.3333333333, ans=0.125 2023-12-22 09:31:53,081 INFO [train.py:886] (3/4) Epoch 17, batch 1300, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4941588.27 frames. ], batch size: 99, lr: 6.45e-03, grad_scale: 128.0 2023-12-22 09:32:16,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=517173.3333333333, ans=0.035 2023-12-22 09:32:45,481 INFO [train.py:886] (3/4) Epoch 17, batch 1350, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4941122.17 frames. ], batch size: 99, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:32:49,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=517373.3333333333, ans=0.0 2023-12-22 09:32:52,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517373.3333333333, ans=0.125 2023-12-22 09:32:53,407 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:32:56,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=517440.0, ans=0.125 2023-12-22 09:33:16,148 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.365e+01 2.812e+01 2.965e+01 3.178e+01 3.888e+01, threshold=5.930e+01, percent-clipped=0.0 2023-12-22 09:33:27,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=12.0 2023-12-22 09:33:36,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=517640.0, ans=0.0 2023-12-22 09:33:38,003 INFO [train.py:886] (3/4) Epoch 17, batch 1400, loss[loss=0.01528, audio_tagging_loss=0.01528, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4950540.48 frames. ], batch size: 99, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:33:50,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=517773.3333333333, ans=0.0 2023-12-22 09:34:20,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-22 09:34:29,382 INFO [train.py:886] (3/4) Epoch 17, batch 1450, loss[loss=0.015, audio_tagging_loss=0.015, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4953841.53 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:35:00,422 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.798e+01 2.926e+01 3.112e+01 3.814e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 09:35:02,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-12-22 09:35:08,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=518240.0, ans=0.125 2023-12-22 09:35:12,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-12-22 09:35:17,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518306.6666666667, ans=0.1 2023-12-22 09:35:20,874 INFO [train.py:886] (3/4) Epoch 17, batch 1500, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4960657.66 frames. ], batch size: 100, lr: 6.45e-03, grad_scale: 64.0 2023-12-22 09:35:31,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=518440.0, ans=0.035 2023-12-22 09:35:33,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518440.0, ans=0.1 2023-12-22 09:35:35,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-22 09:35:55,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=518573.3333333333, ans=0.125 2023-12-22 09:35:57,855 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.060e-02 2023-12-22 09:35:59,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-22 09:36:03,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.97 vs. limit=22.5 2023-12-22 09:36:06,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.61 vs. limit=22.5 2023-12-22 09:36:12,808 INFO [train.py:886] (3/4) Epoch 17, batch 1550, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4957333.79 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:36:19,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2023-12-22 09:36:20,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=518706.6666666667, ans=0.05 2023-12-22 09:36:23,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=518773.3333333333, ans=0.0 2023-12-22 09:36:28,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=518773.3333333333, ans=0.125 2023-12-22 09:36:43,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=518906.6666666667, ans=0.2 2023-12-22 09:36:44,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.447e+01 2.815e+01 2.986e+01 3.108e+01 3.824e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 09:36:46,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=518906.6666666667, ans=0.125 2023-12-22 09:37:03,978 INFO [train.py:886] (3/4) Epoch 17, batch 1600, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24024.00 frames. ], tot_loss[loss=0.01418, audio_tagging_loss=0.01418, over 4950949.88 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:37:04,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519040.0, ans=0.1 2023-12-22 09:37:06,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519040.0, ans=0.1 2023-12-22 09:37:06,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=519040.0, ans=0.125 2023-12-22 09:37:37,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2023-12-22 09:37:46,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=519306.6666666667, ans=0.025 2023-12-22 09:37:52,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=519306.6666666667, ans=0.025 2023-12-22 09:37:56,790 INFO [train.py:886] (3/4) Epoch 17, batch 1650, loss[loss=0.01531, audio_tagging_loss=0.01531, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4946867.06 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:01,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2023-12-22 09:38:13,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=519440.0, ans=0.0 2023-12-22 09:38:15,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=519440.0, ans=0.125 2023-12-22 09:38:18,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=519506.6666666667, ans=0.0 2023-12-22 09:38:25,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=519506.6666666667, ans=15.0 2023-12-22 09:38:28,292 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.852e+01 2.976e+01 3.149e+01 3.480e+01, threshold=5.952e+01, percent-clipped=0.0 2023-12-22 09:38:29,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=519573.3333333333, ans=0.125 2023-12-22 09:38:45,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=12.0 2023-12-22 09:38:48,451 INFO [train.py:886] (3/4) Epoch 17, batch 1700, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4952327.64 frames. ], batch size: 100, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:38:50,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-12-22 09:39:10,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=12.0 2023-12-22 09:39:19,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=519906.6666666667, ans=0.125 2023-12-22 09:39:38,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-12-22 09:39:40,251 INFO [train.py:886] (3/4) Epoch 17, batch 1750, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4954184.53 frames. ], batch size: 99, lr: 6.44e-03, grad_scale: 64.0 2023-12-22 09:39:48,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=520040.0, ans=0.125 2023-12-22 09:39:59,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=520106.6666666667, ans=0.125 2023-12-22 09:40:02,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-12-22 09:40:11,186 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 2.793e+01 2.974e+01 3.087e+01 3.674e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 09:40:12,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-22 09:40:22,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2023-12-22 09:40:33,503 INFO [train.py:886] (3/4) Epoch 17, batch 1800, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4960148.98 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:40:38,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=520373.3333333333, ans=0.0 2023-12-22 09:40:40,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=520373.3333333333, ans=0.125 2023-12-22 09:40:47,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520440.0, ans=0.1 2023-12-22 09:40:49,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-22 09:40:55,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=520506.6666666667, ans=0.125 2023-12-22 09:41:15,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520640.0, ans=0.1 2023-12-22 09:41:23,650 INFO [train.py:886] (3/4) Epoch 17, batch 1850, loss[loss=0.01416, audio_tagging_loss=0.01416, over 20539.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4954228.58 frames. ], batch size: 107, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:41:38,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=520773.3333333333, ans=0.2 2023-12-22 09:41:47,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=520840.0, ans=0.125 2023-12-22 09:41:51,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2023-12-22 09:41:54,291 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.846e+01 3.000e+01 3.166e+01 3.525e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 09:42:15,598 INFO [train.py:886] (3/4) Epoch 17, batch 1900, loss[loss=0.01544, audio_tagging_loss=0.01544, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4946241.24 frames. ], batch size: 99, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:42:17,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=521040.0, ans=0.0 2023-12-22 09:42:38,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2023-12-22 09:42:49,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-12-22 09:42:56,093 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:43:06,666 INFO [train.py:886] (3/4) Epoch 17, batch 1950, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4944650.79 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:43:06,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521373.3333333333, ans=0.1 2023-12-22 09:43:27,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=521506.6666666667, ans=0.2 2023-12-22 09:43:28,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-12-22 09:43:31,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=521506.6666666667, ans=0.0 2023-12-22 09:43:37,152 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.757e+01 2.929e+01 3.098e+01 3.613e+01, threshold=5.858e+01, percent-clipped=0.0 2023-12-22 09:43:57,569 INFO [train.py:886] (3/4) Epoch 17, batch 2000, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4945563.37 frames. ], batch size: 100, lr: 6.43e-03, grad_scale: 64.0 2023-12-22 09:44:00,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=521706.6666666667, ans=0.04949747468305833 2023-12-22 09:44:01,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=521706.6666666667, ans=0.04949747468305833 2023-12-22 09:44:05,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=521706.6666666667, ans=0.125 2023-12-22 09:44:13,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=521773.3333333333, ans=0.125 2023-12-22 09:44:31,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521906.6666666667, ans=0.1 2023-12-22 09:44:40,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=521973.3333333333, ans=0.125 2023-12-22 09:44:49,066 INFO [train.py:886] (3/4) Epoch 17, batch 2050, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4950212.50 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:44:49,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=522040.0, ans=0.125 2023-12-22 09:44:58,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-12-22 09:45:01,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2023-12-22 09:45:06,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-12-22 09:45:09,819 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:45:13,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522173.3333333333, ans=0.1 2023-12-22 09:45:20,209 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+01 2.771e+01 2.888e+01 3.046e+01 3.628e+01, threshold=5.776e+01, percent-clipped=0.0 2023-12-22 09:45:26,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=522240.0, ans=0.0 2023-12-22 09:45:29,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=522306.6666666667, ans=0.0 2023-12-22 09:45:40,153 INFO [train.py:886] (3/4) Epoch 17, batch 2100, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4950752.48 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:45:41,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522373.3333333333, ans=0.125 2023-12-22 09:45:49,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=522373.3333333333, ans=0.0 2023-12-22 09:45:55,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=522440.0, ans=0.0 2023-12-22 09:46:04,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522506.6666666667, ans=0.1 2023-12-22 09:46:19,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=522573.3333333333, ans=0.125 2023-12-22 09:46:21,714 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 09:46:22,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-22 09:46:32,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-22 09:46:33,799 INFO [train.py:886] (3/4) Epoch 17, batch 2150, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4957498.26 frames. ], batch size: 100, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:46:38,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=522706.6666666667, ans=0.125 2023-12-22 09:47:04,313 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+01 2.873e+01 2.993e+01 3.098e+01 3.427e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 09:47:08,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=522906.6666666667, ans=0.125 2023-12-22 09:47:13,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=522906.6666666667, ans=15.0 2023-12-22 09:47:25,518 INFO [train.py:886] (3/4) Epoch 17, batch 2200, loss[loss=0.01598, audio_tagging_loss=0.01598, over 22089.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4953560.73 frames. ], batch size: 107, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:47:29,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=523040.0, ans=0.125 2023-12-22 09:47:46,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=523173.3333333333, ans=22.5 2023-12-22 09:47:50,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=523173.3333333333, ans=0.2 2023-12-22 09:47:53,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-12-22 09:47:55,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2023-12-22 09:48:17,372 INFO [train.py:886] (3/4) Epoch 17, batch 2250, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4948903.27 frames. ], batch size: 99, lr: 6.42e-03, grad_scale: 64.0 2023-12-22 09:48:30,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2023-12-22 09:48:47,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-12-22 09:48:48,954 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.799e+01 2.927e+01 3.060e+01 4.133e+01, threshold=5.854e+01, percent-clipped=0.0 2023-12-22 09:48:52,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=523573.3333333333, ans=0.0 2023-12-22 09:49:10,372 INFO [train.py:886] (3/4) Epoch 17, batch 2300, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4945963.83 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:49:19,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523773.3333333333, ans=0.125 2023-12-22 09:49:30,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523840.0, ans=0.1 2023-12-22 09:49:32,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=523840.0, ans=0.0 2023-12-22 09:50:02,411 INFO [train.py:886] (3/4) Epoch 17, batch 2350, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4948123.69 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:50:05,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524040.0, ans=0.1 2023-12-22 09:50:13,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=524106.6666666667, ans=0.0 2023-12-22 09:50:19,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=524106.6666666667, ans=0.0 2023-12-22 09:50:25,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=524173.3333333333, ans=0.125 2023-12-22 09:50:33,832 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.768e+01 2.920e+01 3.076e+01 3.528e+01, threshold=5.841e+01, percent-clipped=0.0 2023-12-22 09:50:40,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-12-22 09:50:43,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-12-22 09:50:54,125 INFO [train.py:886] (3/4) Epoch 17, batch 2400, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4950025.19 frames. ], batch size: 99, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:51:19,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=524506.6666666666, ans=10.0 2023-12-22 09:51:20,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=524506.6666666666, ans=0.125 2023-12-22 09:51:41,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=524640.0, ans=0.0 2023-12-22 09:51:42,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=524640.0, ans=0.125 2023-12-22 09:51:45,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=524640.0, ans=0.125 2023-12-22 09:51:46,796 INFO [train.py:886] (3/4) Epoch 17, batch 2450, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4948222.60 frames. ], batch size: 100, lr: 6.41e-03, grad_scale: 64.0 2023-12-22 09:51:50,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2023-12-22 09:51:52,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=524706.6666666666, ans=0.2 2023-12-22 09:51:53,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2023-12-22 09:51:59,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=524773.3333333334, ans=0.0 2023-12-22 09:51:59,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=524773.3333333334, ans=0.05 2023-12-22 09:52:03,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2023-12-22 09:52:18,067 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 2.810e+01 2.991e+01 3.131e+01 3.656e+01, threshold=5.983e+01, percent-clipped=0.0 2023-12-22 09:52:35,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-12-22 09:52:38,599 INFO [train.py:886] (3/4) Epoch 17, batch 2500, loss[loss=0.01751, audio_tagging_loss=0.01751, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4944424.11 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:52:42,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2023-12-22 09:52:50,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-12-22 09:52:57,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.05 vs. limit=10.0 2023-12-22 09:52:58,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=525173.3333333334, ans=0.0 2023-12-22 09:53:05,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=525173.3333333334, ans=0.125 2023-12-22 09:53:06,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=525173.3333333334, ans=0.125 2023-12-22 09:53:14,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2023-12-22 09:53:19,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-22 09:53:30,988 INFO [train.py:886] (3/4) Epoch 17, batch 2550, loss[loss=0.01308, audio_tagging_loss=0.01308, over 24046.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4941487.06 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:53:34,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=525373.3333333334, ans=0.125 2023-12-22 09:53:35,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525373.3333333334, ans=0.1 2023-12-22 09:53:38,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=525373.3333333334, ans=0.125 2023-12-22 09:53:54,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.78 vs. limit=5.0 2023-12-22 09:54:02,063 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.815e+01 2.969e+01 3.145e+01 4.179e+01, threshold=5.937e+01, percent-clipped=0.0 2023-12-22 09:54:09,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=525573.3333333334, ans=0.125 2023-12-22 09:54:23,105 INFO [train.py:886] (3/4) Epoch 17, batch 2600, loss[loss=0.01535, audio_tagging_loss=0.01535, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4933798.20 frames. ], batch size: 99, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:54:42,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=525840.0, ans=0.125 2023-12-22 09:54:51,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=525840.0, ans=0.07 2023-12-22 09:55:07,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=525973.3333333334, ans=0.1 2023-12-22 09:55:07,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2023-12-22 09:55:13,762 INFO [train.py:886] (3/4) Epoch 17, batch 2650, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4936167.13 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:55:16,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=526040.0, ans=0.0 2023-12-22 09:55:26,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=526106.6666666666, ans=0.2 2023-12-22 09:55:29,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526106.6666666666, ans=0.125 2023-12-22 09:55:44,584 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.500e+01 2.764e+01 2.889e+01 3.029e+01 3.511e+01, threshold=5.779e+01, percent-clipped=0.0 2023-12-22 09:55:46,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526240.0, ans=0.125 2023-12-22 09:55:52,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=526240.0, ans=0.0 2023-12-22 09:55:53,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=526240.0, ans=0.2 2023-12-22 09:56:06,515 INFO [train.py:886] (3/4) Epoch 17, batch 2700, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4941183.90 frames. ], batch size: 100, lr: 6.40e-03, grad_scale: 64.0 2023-12-22 09:56:06,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526373.3333333334, ans=0.125 2023-12-22 09:56:07,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=526373.3333333334, ans=0.0 2023-12-22 09:56:21,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=526440.0, ans=0.0 2023-12-22 09:56:48,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=526640.0, ans=0.0 2023-12-22 09:56:51,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2023-12-22 09:56:57,706 INFO [train.py:886] (3/4) Epoch 17, batch 2750, loss[loss=0.01518, audio_tagging_loss=0.01518, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4944030.05 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:57:28,222 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.829e+01 2.970e+01 3.111e+01 3.407e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 09:57:40,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-12-22 09:57:50,853 INFO [train.py:886] (3/4) Epoch 17, batch 2800, loss[loss=0.01348, audio_tagging_loss=0.01348, over 25000.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4946122.30 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:57:53,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527040.0, ans=0.1 2023-12-22 09:58:04,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=527106.6666666666, ans=0.0 2023-12-22 09:58:23,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=527240.0, ans=0.125 2023-12-22 09:58:30,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=527240.0, ans=0.2 2023-12-22 09:58:32,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=527306.6666666666, ans=0.0 2023-12-22 09:58:35,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=15.0 2023-12-22 09:58:43,620 INFO [train.py:886] (3/4) Epoch 17, batch 2850, loss[loss=0.01572, audio_tagging_loss=0.01572, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4940231.47 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:58:54,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=527440.0, ans=0.1 2023-12-22 09:59:00,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=527440.0, ans=0.0 2023-12-22 09:59:07,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=527506.6666666666, ans=0.1 2023-12-22 09:59:14,233 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.860e+01 2.998e+01 3.153e+01 3.451e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 09:59:34,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-12-22 09:59:34,624 INFO [train.py:886] (3/4) Epoch 17, batch 2900, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4940184.47 frames. ], batch size: 99, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 09:59:43,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=527706.6666666666, ans=0.125 2023-12-22 09:59:55,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=527840.0, ans=0.0 2023-12-22 10:00:05,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527906.6666666666, ans=0.125 2023-12-22 10:00:16,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=527973.3333333334, ans=0.2 2023-12-22 10:00:27,714 INFO [train.py:886] (3/4) Epoch 17, batch 2950, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4947733.74 frames. ], batch size: 100, lr: 6.39e-03, grad_scale: 64.0 2023-12-22 10:00:29,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-12-22 10:00:33,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-12-22 10:00:38,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-12-22 10:00:42,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.19 vs. limit=10.0 2023-12-22 10:00:48,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=528173.3333333334, ans=0.2 2023-12-22 10:00:56,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=528173.3333333334, ans=0.0 2023-12-22 10:00:59,345 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.819e+01 2.962e+01 3.151e+01 3.518e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:01:15,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=528306.6666666666, ans=0.125 2023-12-22 10:01:19,348 INFO [train.py:886] (3/4) Epoch 17, batch 3000, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4955674.57 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:01:19,348 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 10:01:27,697 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4096, 3.4126, 3.0528, 0.6792], device='cuda:3') 2023-12-22 10:01:40,061 INFO [train.py:917] (3/4) Epoch 17, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:01:40,062 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 10:01:59,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=528440.0, ans=0.0 2023-12-22 10:02:04,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=528506.6666666666, ans=0.125 2023-12-22 10:02:15,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=528573.3333333334, ans=0.125 2023-12-22 10:02:26,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-12-22 10:02:33,292 INFO [train.py:886] (3/4) Epoch 17, batch 3050, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4961425.85 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:02:33,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.13 vs. limit=22.5 2023-12-22 10:02:44,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=528773.3333333334, ans=0.035 2023-12-22 10:03:03,845 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.845e+01 2.964e+01 3.082e+01 4.137e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 10:03:25,019 INFO [train.py:886] (3/4) Epoch 17, batch 3100, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4959218.84 frames. ], batch size: 100, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:03:28,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=529040.0, ans=0.0 2023-12-22 10:03:33,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.38 vs. limit=10.0 2023-12-22 10:03:36,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=22.5 2023-12-22 10:03:53,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.91 vs. limit=10.0 2023-12-22 10:03:58,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=529240.0, ans=10.0 2023-12-22 10:04:13,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-12-22 10:04:17,569 INFO [train.py:886] (3/4) Epoch 17, batch 3150, loss[loss=0.01319, audio_tagging_loss=0.01319, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4950708.30 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:04:39,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529506.6666666666, ans=0.125 2023-12-22 10:04:49,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.844e+01 2.967e+01 3.134e+01 3.611e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:04:59,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=529640.0, ans=0.125 2023-12-22 10:05:09,795 INFO [train.py:886] (3/4) Epoch 17, batch 3200, loss[loss=0.01489, audio_tagging_loss=0.01489, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4948795.93 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 64.0 2023-12-22 10:05:19,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-12-22 10:05:32,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529840.0, ans=0.1 2023-12-22 10:05:34,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=529840.0, ans=0.125 2023-12-22 10:05:42,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529906.6666666666, ans=0.1 2023-12-22 10:05:45,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529906.6666666666, ans=0.1 2023-12-22 10:05:48,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=529906.6666666666, ans=0.2 2023-12-22 10:05:57,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-22 10:05:59,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-12-22 10:06:01,814 INFO [train.py:886] (3/4) Epoch 17, batch 3250, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4947736.58 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:05,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-12-22 10:06:05,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=530040.0, ans=0.2 2023-12-22 10:06:06,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.58 vs. limit=10.0 2023-12-22 10:06:32,361 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.816e+01 2.913e+01 3.117e+01 3.932e+01, threshold=5.825e+01, percent-clipped=0.0 2023-12-22 10:06:32,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=530240.0, ans=0.125 2023-12-22 10:06:53,261 INFO [train.py:886] (3/4) Epoch 17, batch 3300, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4952165.50 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:06:58,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=530373.3333333334, ans=0.2 2023-12-22 10:06:59,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-12-22 10:07:04,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=530440.0, ans=0.2 2023-12-22 10:07:32,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=530573.3333333334, ans=0.025 2023-12-22 10:07:45,505 INFO [train.py:886] (3/4) Epoch 17, batch 3350, loss[loss=0.01342, audio_tagging_loss=0.01342, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4951173.61 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 128.0 2023-12-22 10:07:53,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=530706.6666666666, ans=0.2 2023-12-22 10:08:17,810 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+01 2.799e+01 2.966e+01 3.159e+01 3.514e+01, threshold=5.932e+01, percent-clipped=0.0 2023-12-22 10:08:23,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=530906.6666666666, ans=0.1 2023-12-22 10:08:24,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=530906.6666666666, ans=0.2 2023-12-22 10:08:27,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-12-22 10:08:36,651 INFO [train.py:886] (3/4) Epoch 17, batch 3400, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4955550.06 frames. ], batch size: 100, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:08:55,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=531106.6666666666, ans=0.125 2023-12-22 10:08:57,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2023-12-22 10:09:08,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2023-12-22 10:09:14,680 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:09:25,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-12-22 10:09:29,999 INFO [train.py:886] (3/4) Epoch 17, batch 3450, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4949550.13 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:09:30,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=531373.3333333334, ans=0.125 2023-12-22 10:09:32,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=531373.3333333334, ans=0.0 2023-12-22 10:09:34,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531373.3333333334, ans=0.125 2023-12-22 10:09:34,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=531373.3333333334, ans=0.0 2023-12-22 10:09:43,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531440.0, ans=0.125 2023-12-22 10:09:58,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=531506.6666666666, ans=0.0 2023-12-22 10:10:00,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-22 10:10:02,501 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.535e+01 2.890e+01 3.022e+01 3.190e+01 3.509e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 10:10:06,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531573.3333333334, ans=0.1 2023-12-22 10:10:09,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=531573.3333333334, ans=0.125 2023-12-22 10:10:12,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531640.0, ans=0.1 2023-12-22 10:10:23,157 INFO [train.py:886] (3/4) Epoch 17, batch 3500, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 4948145.88 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 64.0 2023-12-22 10:10:25,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2023-12-22 10:10:29,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=531706.6666666666, ans=0.07 2023-12-22 10:10:43,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=531840.0, ans=0.125 2023-12-22 10:10:50,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=531840.0, ans=0.125 2023-12-22 10:10:54,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=531906.6666666666, ans=0.0 2023-12-22 10:11:11,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-22 10:11:11,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=531973.3333333334, ans=0.125 2023-12-22 10:11:11,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=531973.3333333334, ans=0.125 2023-12-22 10:11:14,707 INFO [train.py:886] (3/4) Epoch 17, batch 3550, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4948482.02 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:11:18,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=532040.0, ans=0.0 2023-12-22 10:11:18,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532040.0, ans=0.1 2023-12-22 10:11:23,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=532040.0, ans=0.125 2023-12-22 10:11:25,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=532106.6666666666, ans=0.125 2023-12-22 10:11:29,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=532106.6666666666, ans=0.125 2023-12-22 10:11:46,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.431e+01 2.820e+01 2.970e+01 3.113e+01 4.129e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 10:11:51,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2023-12-22 10:11:56,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=532306.6666666666, ans=0.015 2023-12-22 10:12:00,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532306.6666666666, ans=0.1 2023-12-22 10:12:06,050 INFO [train.py:886] (3/4) Epoch 17, batch 3600, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4951455.74 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:12:41,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=532573.3333333334, ans=0.125 2023-12-22 10:12:51,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=532640.0, ans=0.125 2023-12-22 10:12:55,931 INFO [train.py:886] (3/4) Epoch 17, batch 3650, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4953928.72 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:13:00,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-12-22 10:13:21,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=532840.0, ans=0.125 2023-12-22 10:13:27,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=532906.6666666666, ans=0.125 2023-12-22 10:13:27,614 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.518e+01 2.770e+01 2.907e+01 3.018e+01 3.500e+01, threshold=5.815e+01, percent-clipped=0.0 2023-12-22 10:13:45,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=532973.3333333334, ans=10.0 2023-12-22 10:13:47,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2023-12-22 10:13:48,595 INFO [train.py:886] (3/4) Epoch 17, batch 3700, loss[loss=0.01586, audio_tagging_loss=0.01586, over 25000.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4958511.26 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:13:55,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=533040.0, ans=0.125 2023-12-22 10:14:11,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=533173.3333333334, ans=0.125 2023-12-22 10:14:15,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533173.3333333334, ans=0.1 2023-12-22 10:14:19,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533240.0, ans=0.125 2023-12-22 10:14:22,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=533240.0, ans=0.125 2023-12-22 10:14:29,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=533306.6666666666, ans=0.125 2023-12-22 10:14:42,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=533373.3333333334, ans=0.0 2023-12-22 10:14:43,264 INFO [train.py:886] (3/4) Epoch 17, batch 3750, loss[loss=0.01662, audio_tagging_loss=0.01662, over 24750.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4951897.73 frames. ], batch size: 99, lr: 6.36e-03, grad_scale: 64.0 2023-12-22 10:15:14,891 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.851e+01 2.968e+01 3.092e+01 3.872e+01, threshold=5.935e+01, percent-clipped=0.0 2023-12-22 10:15:16,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533573.3333333334, ans=0.125 2023-12-22 10:15:18,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2023-12-22 10:15:19,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=533573.3333333334, ans=0.125 2023-12-22 10:15:31,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-12-22 10:15:33,542 INFO [train.py:886] (3/4) Epoch 17, batch 3800, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4940197.14 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:15:39,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-12-22 10:15:41,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=533706.6666666666, ans=0.125 2023-12-22 10:15:43,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=533773.3333333334, ans=0.0 2023-12-22 10:16:07,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=533906.6666666666, ans=0.0 2023-12-22 10:16:26,367 INFO [train.py:886] (3/4) Epoch 17, batch 3850, loss[loss=0.01215, audio_tagging_loss=0.01215, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4939880.47 frames. ], batch size: 99, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:16:28,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=534040.0, ans=0.0 2023-12-22 10:16:37,992 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:16:38,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=534106.6666666666, ans=0.0 2023-12-22 10:16:45,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=534173.3333333334, ans=0.125 2023-12-22 10:16:56,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=534240.0, ans=0.125 2023-12-22 10:16:58,069 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.795e+01 2.953e+01 3.065e+01 3.551e+01, threshold=5.907e+01, percent-clipped=0.0 2023-12-22 10:17:16,794 INFO [train.py:886] (3/4) Epoch 17, batch 3900, loss[loss=0.01569, audio_tagging_loss=0.01569, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4946982.39 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:17:17,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-12-22 10:17:19,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-12-22 10:18:08,576 INFO [train.py:886] (3/4) Epoch 17, batch 3950, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4954975.54 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:18:13,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534706.6666666666, ans=0.1 2023-12-22 10:18:32,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534840.0, ans=0.125 2023-12-22 10:18:32,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=534840.0, ans=15.0 2023-12-22 10:18:33,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=534840.0, ans=0.125 2023-12-22 10:18:39,647 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.778e+01 2.901e+01 3.094e+01 3.779e+01, threshold=5.801e+01, percent-clipped=0.0 2023-12-22 10:18:45,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-12-22 10:18:58,643 INFO [train.py:886] (3/4) Epoch 17, batch 4000, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4960321.42 frames. ], batch size: 100, lr: 6.35e-03, grad_scale: 64.0 2023-12-22 10:19:16,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=535106.6666666666, ans=0.0 2023-12-22 10:19:26,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=535173.3333333334, ans=0.125 2023-12-22 10:19:44,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=535306.6666666666, ans=0.0 2023-12-22 10:19:46,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=535306.6666666666, ans=0.125 2023-12-22 10:19:49,777 INFO [train.py:886] (3/4) Epoch 17, batch 4050, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4957241.27 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:19:55,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2023-12-22 10:20:06,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535440.0, ans=0.1 2023-12-22 10:20:11,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-12-22 10:20:22,062 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 2.893e+01 2.988e+01 3.133e+01 3.578e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 10:20:24,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=535573.3333333334, ans=0.04949747468305833 2023-12-22 10:20:28,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=535573.3333333334, ans=0.2 2023-12-22 10:20:42,124 INFO [train.py:886] (3/4) Epoch 17, batch 4100, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4945685.92 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:20:55,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=535773.3333333334, ans=0.0 2023-12-22 10:21:05,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-12-22 10:21:15,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=535906.6666666666, ans=0.2 2023-12-22 10:21:19,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-12-22 10:21:20,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-22 10:21:23,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-22 10:21:27,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-12-22 10:21:32,759 INFO [train.py:886] (3/4) Epoch 17, batch 4150, loss[loss=0.01424, audio_tagging_loss=0.01424, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4940378.62 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:21:56,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-12-22 10:22:01,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=536173.3333333334, ans=0.07 2023-12-22 10:22:04,548 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+01 2.856e+01 2.974e+01 3.122e+01 3.730e+01, threshold=5.949e+01, percent-clipped=0.0 2023-12-22 10:22:13,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=536306.6666666666, ans=0.0 2023-12-22 10:22:24,188 INFO [train.py:886] (3/4) Epoch 17, batch 4200, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4939029.90 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:22:26,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=536373.3333333334, ans=0.2 2023-12-22 10:22:29,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=536373.3333333334, ans=0.125 2023-12-22 10:23:15,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536706.6666666666, ans=0.0 2023-12-22 10:23:15,967 INFO [train.py:886] (3/4) Epoch 17, batch 4250, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4943905.72 frames. ], batch size: 99, lr: 6.34e-03, grad_scale: 64.0 2023-12-22 10:23:22,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2023-12-22 10:23:29,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-12-22 10:23:29,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.80 vs. limit=22.5 2023-12-22 10:23:33,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-12-22 10:23:35,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536840.0, ans=0.125 2023-12-22 10:23:41,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-12-22 10:23:42,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=536840.0, ans=0.2 2023-12-22 10:23:47,883 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.846e+01 2.960e+01 3.073e+01 3.606e+01, threshold=5.921e+01, percent-clipped=0.0 2023-12-22 10:23:58,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-12-22 10:24:01,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=536973.3333333334, ans=0.0 2023-12-22 10:24:06,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537040.0, ans=0.1 2023-12-22 10:24:06,817 INFO [train.py:886] (3/4) Epoch 17, batch 4300, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4945486.57 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:24:29,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=537173.3333333334, ans=0.0 2023-12-22 10:24:32,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2023-12-22 10:24:36,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537173.3333333334, ans=0.1 2023-12-22 10:24:36,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=537173.3333333334, ans=0.125 2023-12-22 10:24:47,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=537240.0, ans=0.2 2023-12-22 10:24:59,301 INFO [train.py:886] (3/4) Epoch 17, batch 4350, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4952007.62 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:24:59,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=537373.3333333334, ans=0.125 2023-12-22 10:25:03,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-22 10:25:07,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=537373.3333333334, ans=0.07 2023-12-22 10:25:13,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=537440.0, ans=0.125 2023-12-22 10:25:18,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-12-22 10:25:26,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=537506.6666666666, ans=0.2 2023-12-22 10:25:30,832 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.833e+01 2.974e+01 3.137e+01 3.666e+01, threshold=5.948e+01, percent-clipped=0.0 2023-12-22 10:25:43,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=537640.0, ans=0.125 2023-12-22 10:25:50,421 INFO [train.py:886] (3/4) Epoch 17, batch 4400, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4946508.13 frames. ], batch size: 99, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:25:56,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=537706.6666666666, ans=0.5 2023-12-22 10:26:09,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=537773.3333333334, ans=0.125 2023-12-22 10:26:10,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=537840.0, ans=0.125 2023-12-22 10:26:18,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-22 10:26:41,580 INFO [train.py:886] (3/4) Epoch 17, batch 4450, loss[loss=0.01519, audio_tagging_loss=0.01519, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4947966.65 frames. ], batch size: 99, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:26:51,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=538106.6666666666, ans=0.125 2023-12-22 10:26:54,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-22 10:26:56,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=538106.6666666666, ans=0.125 2023-12-22 10:26:59,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=538106.6666666666, ans=0.0 2023-12-22 10:26:59,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=538106.6666666666, ans=0.125 2023-12-22 10:27:05,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=538173.3333333334, ans=0.07 2023-12-22 10:27:13,177 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.528e+01 2.814e+01 2.976e+01 3.129e+01 3.598e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 10:27:31,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=538373.3333333334, ans=0.125 2023-12-22 10:27:32,748 INFO [train.py:886] (3/4) Epoch 17, batch 4500, loss[loss=0.01647, audio_tagging_loss=0.01647, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4948573.08 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 64.0 2023-12-22 10:27:39,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538373.3333333334, ans=0.125 2023-12-22 10:27:40,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=538373.3333333334, ans=0.0 2023-12-22 10:27:46,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-12-22 10:27:56,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=538506.6666666666, ans=0.0 2023-12-22 10:28:21,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=538640.0, ans=0.125 2023-12-22 10:28:24,639 INFO [train.py:886] (3/4) Epoch 17, batch 4550, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4951565.72 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:28:25,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-12-22 10:28:39,791 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:28:44,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538840.0, ans=0.1 2023-12-22 10:28:57,008 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 2.801e+01 2.924e+01 3.059e+01 3.634e+01, threshold=5.849e+01, percent-clipped=0.0 2023-12-22 10:29:00,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=538906.6666666666, ans=0.125 2023-12-22 10:29:07,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=538973.3333333334, ans=0.2 2023-12-22 10:29:14,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=538973.3333333334, ans=0.2 2023-12-22 10:29:16,001 INFO [train.py:886] (3/4) Epoch 17, batch 4600, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4953146.73 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:29:16,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=539040.0, ans=0.2 2023-12-22 10:29:18,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=539040.0, ans=0.09899494936611666 2023-12-22 10:29:31,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=539106.6666666666, ans=0.0 2023-12-22 10:29:33,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=539106.6666666666, ans=0.0 2023-12-22 10:29:39,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539173.3333333334, ans=0.1 2023-12-22 10:29:43,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=539173.3333333334, ans=0.0 2023-12-22 10:29:44,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539173.3333333334, ans=0.125 2023-12-22 10:29:56,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=539306.6666666666, ans=0.125 2023-12-22 10:30:08,711 INFO [train.py:886] (3/4) Epoch 17, batch 4650, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4956136.85 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:30:20,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=539440.0, ans=0.0 2023-12-22 10:30:33,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-22 10:30:41,872 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.874e+01 3.014e+01 3.111e+01 3.634e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:30:51,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=539640.0, ans=0.2 2023-12-22 10:30:56,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539640.0, ans=0.1 2023-12-22 10:31:00,404 INFO [train.py:886] (3/4) Epoch 17, batch 4700, loss[loss=0.01588, audio_tagging_loss=0.01588, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4952569.52 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:31:19,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=539840.0, ans=0.2 2023-12-22 10:31:20,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=539840.0, ans=0.07 2023-12-22 10:31:47,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=540040.0, ans=0.125 2023-12-22 10:31:48,493 INFO [train.py:886] (3/4) Epoch 17, batch 4750, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 4949293.74 frames. ], batch size: 99, lr: 6.32e-03, grad_scale: 64.0 2023-12-22 10:32:26,596 INFO [train.py:886] (3/4) Epoch 18, batch 0, loss[loss=0.02854, audio_tagging_loss=0.02854, over 23944.00 frames. ], tot_loss[loss=0.02854, audio_tagging_loss=0.02854, over 23944.00 frames. ], batch size: 100, lr: 6.14e-03, grad_scale: 32.0 2023-12-22 10:32:26,596 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 10:32:47,812 INFO [train.py:917] (3/4) Epoch 18, validation: loss=0.03336, audio_tagging_loss=0.03336, over 3737520.00 frames. 2023-12-22 10:32:47,813 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 10:32:52,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=540146.6666666666, ans=0.125 2023-12-22 10:33:02,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 2.896e+01 3.092e+01 3.384e+01 9.418e+01, threshold=6.184e+01, percent-clipped=7.0 2023-12-22 10:33:26,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-22 10:33:32,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=540413.3333333334, ans=0.0 2023-12-22 10:33:34,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=540413.3333333334, ans=0.2 2023-12-22 10:33:36,993 INFO [train.py:886] (3/4) Epoch 18, batch 50, loss[loss=0.01695, audio_tagging_loss=0.01695, over 25000.00 frames. ], tot_loss[loss=0.02205, audio_tagging_loss=0.02205, over 1118307.82 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:33:54,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=540546.6666666666, ans=0.125 2023-12-22 10:34:04,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=540613.3333333334, ans=0.125 2023-12-22 10:34:09,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540680.0, ans=0.0 2023-12-22 10:34:14,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=540680.0, ans=0.125 2023-12-22 10:34:25,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540746.6666666666, ans=0.1 2023-12-22 10:34:30,080 INFO [train.py:886] (3/4) Epoch 18, batch 100, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01905, audio_tagging_loss=0.01905, over 1969688.23 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:34:34,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=540813.3333333334, ans=10.0 2023-12-22 10:34:41,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540880.0, ans=0.1 2023-12-22 10:34:45,958 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.148e+01 3.407e+01 3.744e+01 5.066e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-22 10:34:49,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=540946.6666666666, ans=0.2 2023-12-22 10:34:56,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=540946.6666666666, ans=0.125 2023-12-22 10:35:01,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=541013.3333333334, ans=0.2 2023-12-22 10:35:02,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=541013.3333333334, ans=0.125 2023-12-22 10:35:20,929 INFO [train.py:886] (3/4) Epoch 18, batch 150, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.0173, audio_tagging_loss=0.0173, over 2639873.16 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:35:25,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=541146.6666666666, ans=0.125 2023-12-22 10:35:36,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541213.3333333334, ans=0.1 2023-12-22 10:35:39,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=541213.3333333334, ans=0.0 2023-12-22 10:35:41,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=541280.0, ans=0.125 2023-12-22 10:35:50,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=541346.6666666666, ans=0.125 2023-12-22 10:35:54,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=541346.6666666666, ans=0.0 2023-12-22 10:36:04,107 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:36:08,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2023-12-22 10:36:11,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541480.0, ans=0.1 2023-12-22 10:36:12,542 INFO [train.py:886] (3/4) Epoch 18, batch 200, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01623, audio_tagging_loss=0.01623, over 3152089.78 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:36:29,159 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.889e+01 3.014e+01 3.175e+01 3.778e+01, threshold=6.028e+01, percent-clipped=0.0 2023-12-22 10:36:57,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-22 10:36:58,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541746.6666666666, ans=0.1 2023-12-22 10:36:59,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=541746.6666666666, ans=0.125 2023-12-22 10:37:04,251 INFO [train.py:886] (3/4) Epoch 18, batch 250, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.01576, audio_tagging_loss=0.01576, over 3552584.94 frames. ], batch size: 100, lr: 6.13e-03, grad_scale: 32.0 2023-12-22 10:37:25,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=541946.6666666666, ans=0.0 2023-12-22 10:37:36,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542013.3333333334, ans=0.1 2023-12-22 10:37:52,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542080.0, ans=0.125 2023-12-22 10:37:54,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=542080.0, ans=0.0 2023-12-22 10:37:56,125 INFO [train.py:886] (3/4) Epoch 18, batch 300, loss[loss=0.01431, audio_tagging_loss=0.01431, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 3860683.21 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:38:14,086 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 2.870e+01 3.034e+01 3.182e+01 3.757e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 10:38:20,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=542280.0, ans=0.125 2023-12-22 10:38:21,013 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:38:21,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=542280.0, ans=0.95 2023-12-22 10:38:46,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=542413.3333333334, ans=0.2 2023-12-22 10:38:48,506 INFO [train.py:886] (3/4) Epoch 18, batch 350, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01514, audio_tagging_loss=0.01514, over 4099659.67 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:38:55,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=542480.0, ans=0.125 2023-12-22 10:38:59,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=542546.6666666666, ans=0.125 2023-12-22 10:39:12,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=542613.3333333334, ans=0.125 2023-12-22 10:39:21,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2023-12-22 10:39:31,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=542746.6666666666, ans=0.125 2023-12-22 10:39:36,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=542746.6666666666, ans=0.1 2023-12-22 10:39:39,166 INFO [train.py:886] (3/4) Epoch 18, batch 400, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 4283468.69 frames. ], batch size: 99, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:39:39,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542813.3333333334, ans=0.125 2023-12-22 10:39:49,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542880.0, ans=0.1 2023-12-22 10:39:55,840 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.811e+01 2.915e+01 3.061e+01 3.426e+01, threshold=5.831e+01, percent-clipped=0.0 2023-12-22 10:40:31,252 INFO [train.py:886] (3/4) Epoch 18, batch 450, loss[loss=0.01518, audio_tagging_loss=0.01518, over 21966.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 4424841.28 frames. ], batch size: 107, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:40:47,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543213.3333333334, ans=0.125 2023-12-22 10:40:47,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543213.3333333334, ans=0.1 2023-12-22 10:40:50,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2023-12-22 10:41:07,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=543346.6666666666, ans=0.125 2023-12-22 10:41:23,710 INFO [train.py:886] (3/4) Epoch 18, batch 500, loss[loss=0.01241, audio_tagging_loss=0.01241, over 21420.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4535207.69 frames. ], batch size: 107, lr: 6.12e-03, grad_scale: 32.0 2023-12-22 10:41:23,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=543480.0, ans=0.0 2023-12-22 10:41:39,746 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 2.753e+01 2.847e+01 2.993e+01 3.573e+01, threshold=5.694e+01, percent-clipped=0.0 2023-12-22 10:41:42,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=543613.3333333334, ans=0.125 2023-12-22 10:41:49,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=543613.3333333334, ans=0.125 2023-12-22 10:41:51,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=543613.3333333334, ans=0.125 2023-12-22 10:41:56,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=543680.0, ans=0.09899494936611666 2023-12-22 10:42:01,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=543680.0, ans=0.2 2023-12-22 10:42:01,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=543680.0, ans=0.0 2023-12-22 10:42:12,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=543746.6666666666, ans=0.035 2023-12-22 10:42:15,105 INFO [train.py:886] (3/4) Epoch 18, batch 550, loss[loss=0.01572, audio_tagging_loss=0.01572, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4625147.23 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:42:30,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=543880.0, ans=0.125 2023-12-22 10:42:32,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=543880.0, ans=0.125 2023-12-22 10:42:44,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=543946.6666666666, ans=0.05 2023-12-22 10:42:56,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=544080.0, ans=0.2 2023-12-22 10:43:01,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=544080.0, ans=0.2 2023-12-22 10:43:02,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=544080.0, ans=0.125 2023-12-22 10:43:06,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=544146.6666666666, ans=0.1 2023-12-22 10:43:07,225 INFO [train.py:886] (3/4) Epoch 18, batch 600, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.01421, audio_tagging_loss=0.01421, over 4699953.26 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:43:14,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 10:43:23,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=544213.3333333334, ans=0.125 2023-12-22 10:43:24,355 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.858e+01 2.981e+01 3.093e+01 3.735e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 10:43:28,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=544280.0, ans=0.125 2023-12-22 10:43:44,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=544346.6666666666, ans=0.5 2023-12-22 10:43:56,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=544413.3333333334, ans=0.04949747468305833 2023-12-22 10:43:59,129 INFO [train.py:886] (3/4) Epoch 18, batch 650, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4749819.47 frames. ], batch size: 99, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:44:13,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=544546.6666666666, ans=0.5 2023-12-22 10:44:26,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544613.3333333334, ans=0.1 2023-12-22 10:44:28,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=544613.3333333334, ans=0.0 2023-12-22 10:44:31,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=544680.0, ans=0.0 2023-12-22 10:44:33,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=544680.0, ans=0.0 2023-12-22 10:44:42,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=544746.6666666666, ans=0.125 2023-12-22 10:44:51,154 INFO [train.py:886] (3/4) Epoch 18, batch 700, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4793963.84 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:45:03,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=15.0 2023-12-22 10:45:08,784 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 2.825e+01 2.944e+01 3.061e+01 3.906e+01, threshold=5.887e+01, percent-clipped=0.0 2023-12-22 10:45:11,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=544880.0, ans=0.125 2023-12-22 10:45:23,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=545013.3333333334, ans=0.0 2023-12-22 10:45:27,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=545013.3333333334, ans=0.125 2023-12-22 10:45:44,166 INFO [train.py:886] (3/4) Epoch 18, batch 750, loss[loss=0.01228, audio_tagging_loss=0.01228, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4823324.18 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:45:46,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545146.6666666666, ans=0.1 2023-12-22 10:45:58,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=15.0 2023-12-22 10:46:17,871 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:46:17,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=545346.6666666666, ans=0.2 2023-12-22 10:46:36,895 INFO [train.py:886] (3/4) Epoch 18, batch 800, loss[loss=0.01524, audio_tagging_loss=0.01524, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4849435.51 frames. ], batch size: 100, lr: 6.11e-03, grad_scale: 32.0 2023-12-22 10:46:50,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=545546.6666666666, ans=0.07 2023-12-22 10:46:52,749 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.820e+01 2.937e+01 3.093e+01 3.443e+01, threshold=5.874e+01, percent-clipped=0.0 2023-12-22 10:47:02,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=545613.3333333334, ans=0.2 2023-12-22 10:47:03,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=545613.3333333334, ans=0.0 2023-12-22 10:47:16,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2023-12-22 10:47:19,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=545746.6666666666, ans=0.125 2023-12-22 10:47:25,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=545746.6666666666, ans=0.0 2023-12-22 10:47:27,805 INFO [train.py:886] (3/4) Epoch 18, batch 850, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4878108.41 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:47:47,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=545880.0, ans=0.125 2023-12-22 10:47:56,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=545946.6666666666, ans=0.0 2023-12-22 10:48:05,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.73 vs. limit=15.0 2023-12-22 10:48:19,669 INFO [train.py:886] (3/4) Epoch 18, batch 900, loss[loss=0.01464, audio_tagging_loss=0.01464, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4896855.98 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:48:23,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546146.6666666666, ans=0.1 2023-12-22 10:48:31,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=546213.3333333334, ans=0.125 2023-12-22 10:48:31,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=546213.3333333334, ans=0.2 2023-12-22 10:48:35,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=546213.3333333334, ans=0.2 2023-12-22 10:48:35,718 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.855e+01 2.963e+01 3.115e+01 3.581e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 10:48:36,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=546213.3333333334, ans=0.125 2023-12-22 10:48:52,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=15.0 2023-12-22 10:48:58,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=546346.6666666666, ans=0.0 2023-12-22 10:49:07,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-12-22 10:49:10,081 INFO [train.py:886] (3/4) Epoch 18, batch 950, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4907522.43 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:49:13,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-22 10:49:18,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=546480.0, ans=10.0 2023-12-22 10:49:27,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=546546.6666666666, ans=15.0 2023-12-22 10:49:34,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-12-22 10:49:37,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2023-12-22 10:49:45,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=546680.0, ans=0.125 2023-12-22 10:50:02,898 INFO [train.py:886] (3/4) Epoch 18, batch 1000, loss[loss=0.01313, audio_tagging_loss=0.01313, over 24750.00 frames. ], tot_loss[loss=0.01409, audio_tagging_loss=0.01409, over 4909801.95 frames. ], batch size: 99, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:50:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=546880.0, ans=0.125 2023-12-22 10:50:16,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=546880.0, ans=0.125 2023-12-22 10:50:17,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546880.0, ans=0.125 2023-12-22 10:50:19,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.455e+01 2.848e+01 2.998e+01 3.205e+01 5.243e+01, threshold=5.996e+01, percent-clipped=0.0 2023-12-22 10:50:22,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=546946.6666666666, ans=10.0 2023-12-22 10:50:30,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=546946.6666666666, ans=0.125 2023-12-22 10:50:53,959 INFO [train.py:886] (3/4) Epoch 18, batch 1050, loss[loss=0.01678, audio_tagging_loss=0.01678, over 25000.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4918990.65 frames. ], batch size: 100, lr: 6.10e-03, grad_scale: 32.0 2023-12-22 10:51:02,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2023-12-22 10:51:03,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=547213.3333333334, ans=0.0 2023-12-22 10:51:18,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=547280.0, ans=0.0 2023-12-22 10:51:23,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=547346.6666666666, ans=0.125 2023-12-22 10:51:30,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=547346.6666666666, ans=0.2 2023-12-22 10:51:37,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=547413.3333333334, ans=0.0 2023-12-22 10:51:38,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2023-12-22 10:51:44,388 INFO [train.py:886] (3/4) Epoch 18, batch 1100, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4927159.21 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:51:57,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=547546.6666666666, ans=0.125 2023-12-22 10:52:02,014 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.837e+01 2.977e+01 3.116e+01 4.656e+01, threshold=5.954e+01, percent-clipped=0.0 2023-12-22 10:52:19,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-12-22 10:52:20,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=547680.0, ans=0.2 2023-12-22 10:52:28,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=547746.6666666666, ans=0.125 2023-12-22 10:52:36,576 INFO [train.py:886] (3/4) Epoch 18, batch 1150, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4927205.68 frames. ], batch size: 100, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:52:36,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=547813.3333333334, ans=0.125 2023-12-22 10:52:38,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2023-12-22 10:52:47,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-12-22 10:52:51,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=547880.0, ans=0.125 2023-12-22 10:53:07,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2023-12-22 10:53:12,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2023-12-22 10:53:26,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=548146.6666666666, ans=0.125 2023-12-22 10:53:27,159 INFO [train.py:886] (3/4) Epoch 18, batch 1200, loss[loss=0.01558, audio_tagging_loss=0.01558, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4931360.03 frames. ], batch size: 99, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:53:27,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=548146.6666666666, ans=0.125 2023-12-22 10:53:34,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=548146.6666666666, ans=0.125 2023-12-22 10:53:42,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=548213.3333333334, ans=0.04949747468305833 2023-12-22 10:53:45,239 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.823e+01 2.963e+01 3.147e+01 3.546e+01, threshold=5.926e+01, percent-clipped=0.0 2023-12-22 10:54:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=548413.3333333334, ans=0.0 2023-12-22 10:54:20,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-12-22 10:54:20,542 INFO [train.py:886] (3/4) Epoch 18, batch 1250, loss[loss=0.01396, audio_tagging_loss=0.01396, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4929784.97 frames. ], batch size: 99, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:54:23,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=548480.0, ans=0.95 2023-12-22 10:54:30,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-12-22 10:54:40,955 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:54:48,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=548613.3333333334, ans=0.125 2023-12-22 10:54:51,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548680.0, ans=0.1 2023-12-22 10:54:51,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=548680.0, ans=0.0 2023-12-22 10:55:02,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=548746.6666666666, ans=0.125 2023-12-22 10:55:13,227 INFO [train.py:886] (3/4) Epoch 18, batch 1300, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4935723.18 frames. ], batch size: 99, lr: 6.09e-03, grad_scale: 32.0 2023-12-22 10:55:15,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=548813.3333333334, ans=0.0 2023-12-22 10:55:26,492 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:55:29,145 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.885e+01 3.017e+01 3.228e+01 3.822e+01, threshold=6.035e+01, percent-clipped=0.0 2023-12-22 10:55:36,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=548946.6666666666, ans=0.125 2023-12-22 10:55:50,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549013.3333333334, ans=0.1 2023-12-22 10:55:51,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=549013.3333333334, ans=0.125 2023-12-22 10:55:54,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-22 10:56:03,804 INFO [train.py:886] (3/4) Epoch 18, batch 1350, loss[loss=0.01429, audio_tagging_loss=0.01429, over 24750.00 frames. ], tot_loss[loss=0.01413, audio_tagging_loss=0.01413, over 4936833.13 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:56:24,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=549213.3333333334, ans=0.0 2023-12-22 10:56:27,811 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:56:31,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549280.0, ans=0.1 2023-12-22 10:56:44,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=549346.6666666666, ans=0.125 2023-12-22 10:56:57,184 INFO [train.py:886] (3/4) Epoch 18, batch 1400, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 4944996.77 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:57:09,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=12.0 2023-12-22 10:57:12,420 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 2.790e+01 2.898e+01 3.128e+01 3.665e+01, threshold=5.796e+01, percent-clipped=0.0 2023-12-22 10:57:40,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549746.6666666666, ans=0.1 2023-12-22 10:57:42,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=549746.6666666666, ans=0.0 2023-12-22 10:57:48,209 INFO [train.py:886] (3/4) Epoch 18, batch 1450, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4945588.89 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:58:31,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=550080.0, ans=0.1 2023-12-22 10:58:40,453 INFO [train.py:886] (3/4) Epoch 18, batch 1500, loss[loss=0.01283, audio_tagging_loss=0.01283, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4948781.47 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:58:46,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-12-22 10:58:48,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2023-12-22 10:58:49,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=550213.3333333334, ans=0.125 2023-12-22 10:58:53,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=550213.3333333334, ans=0.0 2023-12-22 10:58:56,298 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.873e+01 2.986e+01 3.160e+01 3.608e+01, threshold=5.972e+01, percent-clipped=0.0 2023-12-22 10:59:12,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-12-22 10:59:19,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=550346.6666666666, ans=0.2 2023-12-22 10:59:20,034 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 10:59:20,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550413.3333333334, ans=0.125 2023-12-22 10:59:31,564 INFO [train.py:886] (3/4) Epoch 18, batch 1550, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24002.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4951927.23 frames. ], batch size: 100, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 10:59:35,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=550480.0, ans=0.125 2023-12-22 10:59:36,421 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.798e-03 2023-12-22 10:59:36,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=550480.0, ans=0.0 2023-12-22 10:59:37,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-12-22 10:59:41,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=550546.6666666666, ans=0.125 2023-12-22 10:59:43,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-12-22 10:59:43,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=550546.6666666666, ans=0.125 2023-12-22 10:59:57,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=550613.3333333334, ans=0.0 2023-12-22 11:00:23,826 INFO [train.py:886] (3/4) Epoch 18, batch 1600, loss[loss=0.01067, audio_tagging_loss=0.01067, over 24750.00 frames. ], tot_loss[loss=0.01414, audio_tagging_loss=0.01414, over 4948921.86 frames. ], batch size: 99, lr: 6.08e-03, grad_scale: 32.0 2023-12-22 11:00:25,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-22 11:00:27,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-12-22 11:00:32,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=550813.3333333334, ans=0.125 2023-12-22 11:00:33,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=550880.0, ans=0.1 2023-12-22 11:00:38,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=550880.0, ans=22.5 2023-12-22 11:00:39,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=550880.0, ans=0.09899494936611666 2023-12-22 11:00:40,562 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.899e+01 3.029e+01 3.152e+01 3.450e+01, threshold=6.059e+01, percent-clipped=0.0 2023-12-22 11:00:52,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=550946.6666666666, ans=0.0 2023-12-22 11:01:14,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=551080.0, ans=0.1 2023-12-22 11:01:15,673 INFO [train.py:886] (3/4) Epoch 18, batch 1650, loss[loss=0.01304, audio_tagging_loss=0.01304, over 21730.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4939360.01 frames. ], batch size: 107, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:01:20,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=551146.6666666666, ans=0.125 2023-12-22 11:01:22,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-12-22 11:01:25,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551213.3333333334, ans=0.125 2023-12-22 11:01:31,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=551213.3333333334, ans=0.0 2023-12-22 11:01:40,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2023-12-22 11:01:42,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=551280.0, ans=0.125 2023-12-22 11:01:52,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-12-22 11:02:00,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 11:02:08,341 INFO [train.py:886] (3/4) Epoch 18, batch 1700, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4941946.45 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:02:16,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=551480.0, ans=0.2 2023-12-22 11:02:18,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2023-12-22 11:02:24,925 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.844e+01 2.991e+01 3.158e+01 3.709e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 11:02:35,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=551613.3333333334, ans=0.125 2023-12-22 11:02:58,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2023-12-22 11:02:59,962 INFO [train.py:886] (3/4) Epoch 18, batch 1750, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4950958.29 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:03:17,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 11:03:20,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=551946.6666666666, ans=0.2 2023-12-22 11:03:29,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=551946.6666666666, ans=0.125 2023-12-22 11:03:52,487 INFO [train.py:886] (3/4) Epoch 18, batch 1800, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4956725.00 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:03:56,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2023-12-22 11:04:03,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552213.3333333334, ans=0.1 2023-12-22 11:04:08,386 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 2.840e+01 2.988e+01 3.166e+01 4.187e+01, threshold=5.976e+01, percent-clipped=0.0 2023-12-22 11:04:12,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=552280.0, ans=0.0 2023-12-22 11:04:17,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=552280.0, ans=0.5 2023-12-22 11:04:24,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-12-22 11:04:30,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=12.0 2023-12-22 11:04:44,164 INFO [train.py:886] (3/4) Epoch 18, batch 1850, loss[loss=0.01656, audio_tagging_loss=0.01656, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4953696.64 frames. ], batch size: 100, lr: 6.07e-03, grad_scale: 32.0 2023-12-22 11:04:44,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-22 11:04:49,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=552480.0, ans=0.0 2023-12-22 11:04:58,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=552546.6666666666, ans=0.125 2023-12-22 11:05:02,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=552546.6666666666, ans=0.0 2023-12-22 11:05:08,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=552613.3333333334, ans=0.0 2023-12-22 11:05:15,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=552680.0, ans=0.0 2023-12-22 11:05:17,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=552680.0, ans=0.0 2023-12-22 11:05:23,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=552680.0, ans=0.125 2023-12-22 11:05:27,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2023-12-22 11:05:35,452 INFO [train.py:886] (3/4) Epoch 18, batch 1900, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24750.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4950172.82 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:05:39,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-12-22 11:05:49,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552880.0, ans=0.125 2023-12-22 11:05:52,654 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 2.849e+01 3.028e+01 3.160e+01 3.565e+01, threshold=6.056e+01, percent-clipped=0.0 2023-12-22 11:06:00,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=552946.6666666666, ans=0.09899494936611666 2023-12-22 11:06:07,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-12-22 11:06:10,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=553013.3333333334, ans=0.125 2023-12-22 11:06:14,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=553013.3333333334, ans=0.0 2023-12-22 11:06:21,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=553080.0, ans=0.0 2023-12-22 11:06:24,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=553080.0, ans=0.125 2023-12-22 11:06:27,546 INFO [train.py:886] (3/4) Epoch 18, batch 1950, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4949686.19 frames. ], batch size: 99, lr: 6.06e-03, grad_scale: 32.0 2023-12-22 11:06:29,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.32 vs. limit=22.5 2023-12-22 11:06:35,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=553146.6666666666, ans=0.125 2023-12-22 11:06:41,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=12.0 2023-12-22 11:06:43,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2023-12-22 11:06:44,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=12.0 2023-12-22 11:07:06,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-22 11:07:12,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553413.3333333334, ans=0.125 2023-12-22 11:07:18,702 INFO [train.py:886] (3/4) Epoch 18, batch 2000, loss[loss=0.01554, audio_tagging_loss=0.01554, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4946481.15 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:07:19,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=553480.0, ans=0.125 2023-12-22 11:07:26,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=553480.0, ans=0.0 2023-12-22 11:07:35,962 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.475e+01 2.820e+01 2.955e+01 3.133e+01 3.622e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:07:43,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=553613.3333333334, ans=0.09899494936611666 2023-12-22 11:07:58,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553680.0, ans=0.1 2023-12-22 11:08:11,137 INFO [train.py:886] (3/4) Epoch 18, batch 2050, loss[loss=0.01335, audio_tagging_loss=0.01335, over 22661.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4947644.08 frames. ], batch size: 107, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:08:12,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=553813.3333333334, ans=0.125 2023-12-22 11:08:14,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=553813.3333333334, ans=0.2 2023-12-22 11:08:35,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-12-22 11:08:42,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=554013.3333333334, ans=0.0 2023-12-22 11:08:45,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=554013.3333333334, ans=0.025 2023-12-22 11:08:56,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=554080.0, ans=0.125 2023-12-22 11:09:02,568 INFO [train.py:886] (3/4) Epoch 18, batch 2100, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4955017.23 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:09:07,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=554146.6666666666, ans=0.125 2023-12-22 11:09:12,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2023-12-22 11:09:18,380 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+01 2.806e+01 2.962e+01 3.157e+01 3.752e+01, threshold=5.925e+01, percent-clipped=0.0 2023-12-22 11:09:23,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=8.0 2023-12-22 11:09:26,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.35 vs. limit=15.0 2023-12-22 11:09:48,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2023-12-22 11:09:53,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-12-22 11:09:53,740 INFO [train.py:886] (3/4) Epoch 18, batch 2150, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4957493.71 frames. ], batch size: 100, lr: 6.06e-03, grad_scale: 64.0 2023-12-22 11:10:01,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=554480.0, ans=0.125 2023-12-22 11:10:06,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2023-12-22 11:10:10,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=554546.6666666666, ans=0.125 2023-12-22 11:10:14,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554613.3333333334, ans=0.1 2023-12-22 11:10:43,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-12-22 11:10:47,194 INFO [train.py:886] (3/4) Epoch 18, batch 2200, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4958094.11 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:10:47,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=554813.3333333334, ans=0.95 2023-12-22 11:10:50,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=554813.3333333334, ans=0.2 2023-12-22 11:11:00,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=554880.0, ans=0.125 2023-12-22 11:11:02,289 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.880e+01 3.019e+01 3.160e+01 3.670e+01, threshold=6.038e+01, percent-clipped=0.0 2023-12-22 11:11:08,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554946.6666666666, ans=0.125 2023-12-22 11:11:09,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=554946.6666666666, ans=0.0 2023-12-22 11:11:14,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554946.6666666666, ans=0.1 2023-12-22 11:11:18,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=555013.3333333334, ans=0.0 2023-12-22 11:11:18,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=555013.3333333334, ans=0.05 2023-12-22 11:11:21,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=555013.3333333334, ans=0.0 2023-12-22 11:11:23,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-12-22 11:11:28,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=555080.0, ans=0.2 2023-12-22 11:11:31,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=555080.0, ans=0.0 2023-12-22 11:11:38,051 INFO [train.py:886] (3/4) Epoch 18, batch 2250, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24077.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4955225.51 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:11:52,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=555213.3333333334, ans=0.2 2023-12-22 11:12:30,208 INFO [train.py:886] (3/4) Epoch 18, batch 2300, loss[loss=0.01192, audio_tagging_loss=0.01192, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4957936.72 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:12:46,747 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+01 2.767e+01 2.927e+01 3.038e+01 3.631e+01, threshold=5.853e+01, percent-clipped=0.0 2023-12-22 11:12:53,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-12-22 11:13:21,707 INFO [train.py:886] (3/4) Epoch 18, batch 2350, loss[loss=0.01722, audio_tagging_loss=0.01722, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4958986.94 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:13:45,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555946.6666666666, ans=0.1 2023-12-22 11:14:12,062 INFO [train.py:886] (3/4) Epoch 18, batch 2400, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4963109.98 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:14:12,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=556146.6666666666, ans=0.125 2023-12-22 11:14:19,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=556146.6666666666, ans=0.125 2023-12-22 11:14:29,875 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.855e+01 2.962e+01 3.074e+01 3.618e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 11:14:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=556346.6666666666, ans=0.2 2023-12-22 11:14:45,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556346.6666666666, ans=0.125 2023-12-22 11:14:57,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=556413.3333333334, ans=0.125 2023-12-22 11:14:58,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=556413.3333333334, ans=0.0 2023-12-22 11:15:05,121 INFO [train.py:886] (3/4) Epoch 18, batch 2450, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4961162.83 frames. ], batch size: 100, lr: 6.05e-03, grad_scale: 64.0 2023-12-22 11:15:08,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-12-22 11:15:11,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-12-22 11:15:20,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=556546.6666666666, ans=0.125 2023-12-22 11:15:25,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=556613.3333333334, ans=0.125 2023-12-22 11:15:28,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.90 vs. limit=10.0 2023-12-22 11:15:56,167 INFO [train.py:886] (3/4) Epoch 18, batch 2500, loss[loss=0.01331, audio_tagging_loss=0.01331, over 22496.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4947903.66 frames. ], batch size: 107, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:15:57,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=556813.3333333334, ans=0.025 2023-12-22 11:16:06,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556880.0, ans=0.1 2023-12-22 11:16:13,365 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.911e+01 3.040e+01 3.167e+01 3.837e+01, threshold=6.080e+01, percent-clipped=0.0 2023-12-22 11:16:48,298 INFO [train.py:886] (3/4) Epoch 18, batch 2550, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4947899.74 frames. ], batch size: 99, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:16:52,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-12-22 11:16:53,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=557146.6666666666, ans=0.2 2023-12-22 11:17:06,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-12-22 11:17:06,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-12-22 11:17:15,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.57 vs. limit=22.5 2023-12-22 11:17:25,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557346.6666666666, ans=0.1 2023-12-22 11:17:29,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557413.3333333334, ans=0.1 2023-12-22 11:17:40,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=557480.0, ans=0.125 2023-12-22 11:17:40,763 INFO [train.py:886] (3/4) Epoch 18, batch 2600, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4949756.22 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:17:48,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2023-12-22 11:17:56,617 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.461e+01 2.828e+01 2.988e+01 3.124e+01 3.523e+01, threshold=5.975e+01, percent-clipped=0.0 2023-12-22 11:17:56,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=557546.6666666666, ans=0.125 2023-12-22 11:17:57,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557546.6666666666, ans=0.1 2023-12-22 11:18:03,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=557613.3333333334, ans=0.0 2023-12-22 11:18:06,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=557613.3333333334, ans=0.125 2023-12-22 11:18:26,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-12-22 11:18:32,520 INFO [train.py:886] (3/4) Epoch 18, batch 2650, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4948179.47 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:18:32,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=557813.3333333334, ans=0.125 2023-12-22 11:18:55,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=12.0 2023-12-22 11:18:57,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=557946.6666666666, ans=0.125 2023-12-22 11:19:01,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=557946.6666666666, ans=0.0 2023-12-22 11:19:03,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=12.0 2023-12-22 11:19:07,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=558013.3333333334, ans=0.125 2023-12-22 11:19:18,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=558080.0, ans=0.2 2023-12-22 11:19:24,356 INFO [train.py:886] (3/4) Epoch 18, batch 2700, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4950230.93 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 64.0 2023-12-22 11:19:25,454 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:19:31,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=558146.6666666666, ans=0.125 2023-12-22 11:19:40,798 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 2.868e+01 2.973e+01 3.116e+01 3.614e+01, threshold=5.947e+01, percent-clipped=0.0 2023-12-22 11:19:51,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-22 11:19:56,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=558346.6666666666, ans=0.09899494936611666 2023-12-22 11:19:57,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558346.6666666666, ans=0.1 2023-12-22 11:20:08,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558413.3333333334, ans=0.0 2023-12-22 11:20:10,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=558413.3333333334, ans=0.0 2023-12-22 11:20:16,850 INFO [train.py:886] (3/4) Epoch 18, batch 2750, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4958010.68 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:20:30,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2023-12-22 11:20:50,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=558680.0, ans=0.2 2023-12-22 11:21:07,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=558813.3333333334, ans=0.125 2023-12-22 11:21:08,767 INFO [train.py:886] (3/4) Epoch 18, batch 2800, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4961182.26 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 64.0 2023-12-22 11:21:12,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=558813.3333333334, ans=0.0 2023-12-22 11:21:26,344 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+01 2.878e+01 3.022e+01 3.166e+01 3.737e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 11:21:47,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=559013.3333333334, ans=0.125 2023-12-22 11:21:49,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-12-22 11:21:51,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=559080.0, ans=0.125 2023-12-22 11:21:55,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=559080.0, ans=0.0 2023-12-22 11:22:00,828 INFO [train.py:886] (3/4) Epoch 18, batch 2850, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4951720.74 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:01,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=559146.6666666666, ans=0.125 2023-12-22 11:22:01,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=559146.6666666666, ans=0.5 2023-12-22 11:22:52,672 INFO [train.py:886] (3/4) Epoch 18, batch 2900, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4951117.79 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:22:52,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=559480.0, ans=0.125 2023-12-22 11:23:09,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=559546.6666666666, ans=0.09899494936611666 2023-12-22 11:23:10,143 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.901e+01 3.031e+01 3.155e+01 3.612e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 11:23:17,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-12-22 11:23:21,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=559613.3333333334, ans=0.125 2023-12-22 11:23:32,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=559680.0, ans=0.125 2023-12-22 11:23:44,248 INFO [train.py:886] (3/4) Epoch 18, batch 2950, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4952803.38 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:23:54,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=559880.0, ans=0.125 2023-12-22 11:24:11,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=559946.6666666666, ans=0.125 2023-12-22 11:24:22,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-22 11:24:38,498 INFO [train.py:886] (3/4) Epoch 18, batch 3000, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4951994.57 frames. ], batch size: 100, lr: 6.03e-03, grad_scale: 32.0 2023-12-22 11:24:38,499 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 11:24:54,324 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5923, 3.6470, 3.2410, 2.9554], device='cuda:3') 2023-12-22 11:24:59,981 INFO [train.py:917] (3/4) Epoch 18, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-22 11:24:59,982 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 11:25:02,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=560146.6666666666, ans=0.2 2023-12-22 11:25:09,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=560213.3333333334, ans=0.0 2023-12-22 11:25:16,889 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.845e+01 2.970e+01 3.099e+01 3.862e+01, threshold=5.940e+01, percent-clipped=0.0 2023-12-22 11:25:23,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560280.0, ans=0.1 2023-12-22 11:25:25,671 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:25:32,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=560346.6666666666, ans=0.0 2023-12-22 11:25:47,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=560413.3333333334, ans=0.125 2023-12-22 11:25:50,273 INFO [train.py:886] (3/4) Epoch 18, batch 3050, loss[loss=0.01464, audio_tagging_loss=0.01464, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4949090.21 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:25:50,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-22 11:25:58,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560480.0, ans=0.1 2023-12-22 11:26:08,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-22 11:26:11,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.73 vs. limit=22.5 2023-12-22 11:26:15,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=560613.3333333334, ans=0.125 2023-12-22 11:26:42,422 INFO [train.py:886] (3/4) Epoch 18, batch 3100, loss[loss=0.01546, audio_tagging_loss=0.01546, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4953107.90 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:26:48,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2023-12-22 11:26:59,216 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+01 2.848e+01 3.018e+01 3.166e+01 3.503e+01, threshold=6.036e+01, percent-clipped=0.0 2023-12-22 11:27:28,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-12-22 11:27:32,484 INFO [train.py:886] (3/4) Epoch 18, batch 3150, loss[loss=0.01594, audio_tagging_loss=0.01594, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 4949868.61 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:27:42,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=561146.6666666666, ans=0.125 2023-12-22 11:27:46,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=561213.3333333334, ans=15.0 2023-12-22 11:27:56,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=561280.0, ans=0.125 2023-12-22 11:28:09,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=561346.6666666666, ans=0.125 2023-12-22 11:28:12,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2023-12-22 11:28:17,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2023-12-22 11:28:18,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=561413.3333333334, ans=0.0 2023-12-22 11:28:25,740 INFO [train.py:886] (3/4) Epoch 18, batch 3200, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4939350.11 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:28:42,518 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.819e+01 2.957e+01 3.105e+01 3.510e+01, threshold=5.913e+01, percent-clipped=0.0 2023-12-22 11:28:45,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561613.3333333334, ans=0.1 2023-12-22 11:28:50,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=561613.3333333334, ans=0.05 2023-12-22 11:28:50,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2023-12-22 11:28:56,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.19 vs. limit=10.0 2023-12-22 11:29:02,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=561680.0, ans=0.125 2023-12-22 11:29:06,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=561746.6666666666, ans=0.0 2023-12-22 11:29:07,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=561746.6666666666, ans=0.125 2023-12-22 11:29:16,275 INFO [train.py:886] (3/4) Epoch 18, batch 3250, loss[loss=0.01512, audio_tagging_loss=0.01512, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4933521.26 frames. ], batch size: 99, lr: 6.02e-03, grad_scale: 32.0 2023-12-22 11:29:42,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=561946.6666666666, ans=0.125 2023-12-22 11:29:57,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=562080.0, ans=0.0 2023-12-22 11:30:03,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2023-12-22 11:30:04,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=562080.0, ans=0.125 2023-12-22 11:30:07,130 INFO [train.py:886] (3/4) Epoch 18, batch 3300, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4942906.92 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:30:09,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=562146.6666666666, ans=0.125 2023-12-22 11:30:14,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=562146.6666666666, ans=0.125 2023-12-22 11:30:24,548 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 2.841e+01 2.981e+01 3.120e+01 3.559e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 11:30:36,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=562346.6666666666, ans=0.125 2023-12-22 11:30:37,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=562346.6666666666, ans=0.07 2023-12-22 11:30:40,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=562346.6666666666, ans=0.125 2023-12-22 11:30:58,510 INFO [train.py:886] (3/4) Epoch 18, batch 3350, loss[loss=0.0162, audio_tagging_loss=0.0162, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4945542.31 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:30:58,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-12-22 11:31:08,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=562546.6666666666, ans=0.125 2023-12-22 11:31:10,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=562546.6666666666, ans=0.1 2023-12-22 11:31:47,318 INFO [train.py:886] (3/4) Epoch 18, batch 3400, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4953054.32 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:31:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562813.3333333334, ans=0.125 2023-12-22 11:31:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562813.3333333334, ans=0.1 2023-12-22 11:32:04,796 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+01 2.896e+01 3.047e+01 3.185e+01 3.707e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 11:32:08,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=562946.6666666666, ans=0.0 2023-12-22 11:32:17,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=563013.3333333334, ans=0.07 2023-12-22 11:32:28,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2023-12-22 11:32:34,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=563080.0, ans=0.125 2023-12-22 11:32:38,988 INFO [train.py:886] (3/4) Epoch 18, batch 3450, loss[loss=0.01269, audio_tagging_loss=0.01269, over 24750.00 frames. ], tot_loss[loss=0.01397, audio_tagging_loss=0.01397, over 4954721.98 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:32:50,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-12-22 11:32:54,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=563213.3333333334, ans=0.125 2023-12-22 11:32:55,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=563213.3333333334, ans=0.125 2023-12-22 11:33:19,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2023-12-22 11:33:29,157 INFO [train.py:886] (3/4) Epoch 18, batch 3500, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 4946031.03 frames. ], batch size: 99, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:33:34,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=563480.0, ans=0.025 2023-12-22 11:33:43,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.97 vs. limit=10.0 2023-12-22 11:33:45,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=563546.6666666666, ans=0.125 2023-12-22 11:33:46,476 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.893e+01 3.039e+01 3.166e+01 3.781e+01, threshold=6.078e+01, percent-clipped=0.0 2023-12-22 11:33:59,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=563680.0, ans=0.2 2023-12-22 11:34:04,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.70 vs. limit=10.0 2023-12-22 11:34:20,059 INFO [train.py:886] (3/4) Epoch 18, batch 3550, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4946863.54 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 32.0 2023-12-22 11:34:21,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563813.3333333334, ans=0.0 2023-12-22 11:34:32,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=563880.0, ans=0.2 2023-12-22 11:34:55,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2023-12-22 11:35:02,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=564080.0, ans=0.125 2023-12-22 11:35:11,882 INFO [train.py:886] (3/4) Epoch 18, batch 3600, loss[loss=0.01629, audio_tagging_loss=0.01629, over 24750.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4951187.06 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:35:12,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=564146.6666666666, ans=0.125 2023-12-22 11:35:14,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=564146.6666666666, ans=0.125 2023-12-22 11:35:18,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=564146.6666666666, ans=0.125 2023-12-22 11:35:24,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-22 11:35:28,823 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.848e+01 2.990e+01 3.158e+01 3.756e+01, threshold=5.981e+01, percent-clipped=0.0 2023-12-22 11:36:03,814 INFO [train.py:886] (3/4) Epoch 18, batch 3650, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4956726.13 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:36:05,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=564480.0, ans=0.07 2023-12-22 11:36:20,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 11:36:25,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=564613.3333333334, ans=0.125 2023-12-22 11:36:38,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=564680.0, ans=0.125 2023-12-22 11:36:40,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564680.0, ans=0.1 2023-12-22 11:36:43,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564680.0, ans=0.0 2023-12-22 11:36:43,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=564680.0, ans=0.5 2023-12-22 11:36:56,207 INFO [train.py:886] (3/4) Epoch 18, batch 3700, loss[loss=0.0143, audio_tagging_loss=0.0143, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4954435.24 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:36:59,277 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:37:05,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=564880.0, ans=0.125 2023-12-22 11:37:13,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+01 2.838e+01 2.944e+01 3.115e+01 3.574e+01, threshold=5.888e+01, percent-clipped=0.0 2023-12-22 11:37:31,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565013.3333333334, ans=0.125 2023-12-22 11:37:32,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-22 11:37:34,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-12-22 11:37:37,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=565080.0, ans=0.2 2023-12-22 11:37:46,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=565080.0, ans=0.0 2023-12-22 11:37:48,052 INFO [train.py:886] (3/4) Epoch 18, batch 3750, loss[loss=0.01075, audio_tagging_loss=0.01075, over 23974.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4952787.17 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:03,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=565213.3333333334, ans=0.125 2023-12-22 11:38:06,865 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:38:07,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-12-22 11:38:11,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-12-22 11:38:13,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565280.0, ans=0.1 2023-12-22 11:38:24,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565346.6666666666, ans=0.125 2023-12-22 11:38:27,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=565346.6666666666, ans=0.0 2023-12-22 11:38:39,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=565480.0, ans=0.0 2023-12-22 11:38:39,876 INFO [train.py:886] (3/4) Epoch 18, batch 3800, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4943441.54 frames. ], batch size: 99, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:38:44,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=565480.0, ans=0.0 2023-12-22 11:38:49,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=565480.0, ans=0.125 2023-12-22 11:38:51,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565546.6666666666, ans=0.1 2023-12-22 11:38:57,912 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+01 2.848e+01 2.976e+01 3.119e+01 3.728e+01, threshold=5.951e+01, percent-clipped=0.0 2023-12-22 11:39:03,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=565613.3333333334, ans=0.125 2023-12-22 11:39:03,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=565613.3333333334, ans=0.1 2023-12-22 11:39:07,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-12-22 11:39:23,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-12-22 11:39:25,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=565746.6666666666, ans=0.125 2023-12-22 11:39:31,785 INFO [train.py:886] (3/4) Epoch 18, batch 3850, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4944480.40 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2023-12-22 11:39:32,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-12-22 11:39:45,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=565880.0, ans=0.2 2023-12-22 11:39:55,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=565946.6666666666, ans=0.125 2023-12-22 11:39:58,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=565946.6666666666, ans=0.125 2023-12-22 11:39:59,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=565946.6666666666, ans=0.125 2023-12-22 11:40:03,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=566013.3333333334, ans=0.025 2023-12-22 11:40:07,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-12-22 11:40:10,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.83 vs. limit=10.0 2023-12-22 11:40:12,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=566013.3333333334, ans=0.0 2023-12-22 11:40:13,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2023-12-22 11:40:23,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566146.6666666666, ans=0.125 2023-12-22 11:40:24,659 INFO [train.py:886] (3/4) Epoch 18, batch 3900, loss[loss=0.01366, audio_tagging_loss=0.01366, over 22836.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4946549.90 frames. ], batch size: 107, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:40:33,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-12-22 11:40:37,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=566213.3333333334, ans=0.0 2023-12-22 11:40:41,628 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.561e+01 2.851e+01 2.964e+01 3.170e+01 3.515e+01, threshold=5.928e+01, percent-clipped=0.0 2023-12-22 11:40:51,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=566280.0, ans=0.0 2023-12-22 11:40:58,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-12-22 11:41:15,967 INFO [train.py:886] (3/4) Epoch 18, batch 3950, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4953497.33 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:41:16,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566480.0, ans=0.1 2023-12-22 11:41:26,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=566546.6666666666, ans=0.04949747468305833 2023-12-22 11:41:47,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566680.0, ans=0.125 2023-12-22 11:41:50,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=566680.0, ans=0.0 2023-12-22 11:42:09,133 INFO [train.py:886] (3/4) Epoch 18, batch 4000, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4954349.31 frames. ], batch size: 100, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:42:25,330 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+01 2.905e+01 3.054e+01 3.173e+01 3.628e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 11:42:25,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=566880.0, ans=0.0 2023-12-22 11:42:28,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=12.0 2023-12-22 11:42:35,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566946.6666666666, ans=0.1 2023-12-22 11:42:39,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=567013.3333333334, ans=0.125 2023-12-22 11:42:51,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=567080.0, ans=0.2 2023-12-22 11:42:56,545 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.854e-03 2023-12-22 11:42:59,846 INFO [train.py:886] (3/4) Epoch 18, batch 4050, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4956539.66 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:43:08,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-12-22 11:43:14,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-12-22 11:43:47,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=567413.3333333334, ans=0.2 2023-12-22 11:43:52,213 INFO [train.py:886] (3/4) Epoch 18, batch 4100, loss[loss=0.01454, audio_tagging_loss=0.01454, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4951215.64 frames. ], batch size: 99, lr: 5.99e-03, grad_scale: 32.0 2023-12-22 11:44:09,655 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.912e+01 3.022e+01 3.164e+01 3.761e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 11:44:38,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=567746.6666666666, ans=0.5 2023-12-22 11:44:43,776 INFO [train.py:886] (3/4) Epoch 18, batch 4150, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4950325.84 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:44:43,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=567813.3333333334, ans=0.125 2023-12-22 11:44:49,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=567813.3333333334, ans=0.125 2023-12-22 11:44:54,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=567880.0, ans=0.125 2023-12-22 11:45:02,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=567946.6666666666, ans=0.0 2023-12-22 11:45:05,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=567946.6666666666, ans=0.125 2023-12-22 11:45:09,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=567946.6666666666, ans=0.0 2023-12-22 11:45:16,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=568013.3333333334, ans=0.05 2023-12-22 11:45:21,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-12-22 11:45:30,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=568080.0, ans=0.125 2023-12-22 11:45:33,869 INFO [train.py:886] (3/4) Epoch 18, batch 4200, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4946609.09 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:45:52,748 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.822e+01 2.922e+01 3.153e+01 3.772e+01, threshold=5.845e+01, percent-clipped=0.0 2023-12-22 11:45:55,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=568280.0, ans=0.125 2023-12-22 11:45:58,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-22 11:46:05,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=568346.6666666666, ans=0.2 2023-12-22 11:46:27,019 INFO [train.py:886] (3/4) Epoch 18, batch 4250, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4947756.89 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:46:31,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.87 vs. limit=22.5 2023-12-22 11:46:39,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-12-22 11:46:43,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568546.6666666666, ans=0.1 2023-12-22 11:46:48,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=568613.3333333334, ans=0.0 2023-12-22 11:46:52,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=568613.3333333334, ans=0.125 2023-12-22 11:47:00,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=568680.0, ans=0.1 2023-12-22 11:47:00,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-12-22 11:47:15,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-12-22 11:47:17,859 INFO [train.py:886] (3/4) Epoch 18, batch 4300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4950910.50 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:47:18,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=568813.3333333334, ans=0.05 2023-12-22 11:47:33,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=568880.0, ans=0.0 2023-12-22 11:47:36,151 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.510e+01 2.847e+01 2.955e+01 3.077e+01 3.608e+01, threshold=5.910e+01, percent-clipped=0.0 2023-12-22 11:48:06,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=569080.0, ans=0.0 2023-12-22 11:48:10,192 INFO [train.py:886] (3/4) Epoch 18, batch 4350, loss[loss=0.01532, audio_tagging_loss=0.01532, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4951905.42 frames. ], batch size: 100, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:48:10,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569146.6666666666, ans=0.125 2023-12-22 11:48:11,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=569146.6666666666, ans=0.035 2023-12-22 11:48:41,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=569346.6666666666, ans=0.0 2023-12-22 11:48:47,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=569346.6666666666, ans=0.0 2023-12-22 11:48:52,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=569413.3333333334, ans=0.125 2023-12-22 11:49:01,972 INFO [train.py:886] (3/4) Epoch 18, batch 4400, loss[loss=0.0169, audio_tagging_loss=0.0169, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4943399.37 frames. ], batch size: 99, lr: 5.98e-03, grad_scale: 32.0 2023-12-22 11:49:09,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=569480.0, ans=0.0 2023-12-22 11:49:19,484 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.944e+01 3.084e+01 3.234e+01 4.247e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 11:49:19,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=569546.6666666666, ans=0.125 2023-12-22 11:49:27,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=569613.3333333334, ans=0.5 2023-12-22 11:49:53,661 INFO [train.py:886] (3/4) Epoch 18, batch 4450, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4945233.48 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:50:01,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=569813.3333333334, ans=0.0 2023-12-22 11:50:42,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2023-12-22 11:50:45,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=570146.6666666666, ans=0.125 2023-12-22 11:50:45,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570146.6666666666, ans=0.1 2023-12-22 11:50:45,779 INFO [train.py:886] (3/4) Epoch 18, batch 4500, loss[loss=0.01498, audio_tagging_loss=0.01498, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4946122.69 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:50:52,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570146.6666666666, ans=0.1 2023-12-22 11:51:03,229 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+01 2.863e+01 2.984e+01 3.180e+01 3.715e+01, threshold=5.969e+01, percent-clipped=0.0 2023-12-22 11:51:07,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=570280.0, ans=0.125 2023-12-22 11:51:21,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=570346.6666666666, ans=0.0 2023-12-22 11:51:35,732 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:51:37,373 INFO [train.py:886] (3/4) Epoch 18, batch 4550, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4948012.61 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:51:57,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=570613.3333333334, ans=0.2 2023-12-22 11:52:13,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=570680.0, ans=0.125 2023-12-22 11:52:18,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570746.6666666666, ans=0.1 2023-12-22 11:52:29,117 INFO [train.py:886] (3/4) Epoch 18, batch 4600, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4955818.69 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:52:29,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=570813.3333333334, ans=0.125 2023-12-22 11:52:35,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=570813.3333333334, ans=0.2 2023-12-22 11:52:37,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=570813.3333333334, ans=0.0 2023-12-22 11:52:46,539 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.514e+01 2.915e+01 3.083e+01 3.222e+01 3.663e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 11:52:52,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=570946.6666666666, ans=0.125 2023-12-22 11:53:06,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-22 11:53:06,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2023-12-22 11:53:16,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.63 vs. limit=22.5 2023-12-22 11:53:20,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2023-12-22 11:53:20,554 INFO [train.py:886] (3/4) Epoch 18, batch 4650, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4959721.87 frames. ], batch size: 100, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:53:37,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=571213.3333333334, ans=0.125 2023-12-22 11:53:53,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=571346.6666666666, ans=0.0 2023-12-22 11:54:02,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=571413.3333333334, ans=0.125 2023-12-22 11:54:06,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571413.3333333334, ans=0.1 2023-12-22 11:54:10,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-12-22 11:54:11,472 INFO [train.py:886] (3/4) Epoch 18, batch 4700, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4950873.26 frames. ], batch size: 99, lr: 5.97e-03, grad_scale: 32.0 2023-12-22 11:54:26,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.92 vs. limit=15.0 2023-12-22 11:54:26,975 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.832e+01 2.995e+01 3.118e+01 3.636e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 11:54:34,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=571613.3333333334, ans=0.125 2023-12-22 11:54:52,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2023-12-22 11:54:56,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=571746.6666666666, ans=0.0 2023-12-22 11:54:58,481 INFO [train.py:886] (3/4) Epoch 18, batch 4750, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4947400.12 frames. ], batch size: 99, lr: 5.96e-03, grad_scale: 32.0 2023-12-22 11:55:00,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=571813.3333333334, ans=0.0 2023-12-22 11:55:08,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=571880.0, ans=0.125 2023-12-22 11:55:34,686 INFO [train.py:886] (3/4) Epoch 19, batch 0, loss[loss=0.03278, audio_tagging_loss=0.03278, over 24098.00 frames. ], tot_loss[loss=0.03278, audio_tagging_loss=0.03278, over 24098.00 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 32.0 2023-12-22 11:55:34,686 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 11:55:56,077 INFO [train.py:917] (3/4) Epoch 19, validation: loss=0.03209, audio_tagging_loss=0.03209, over 3737520.00 frames. 2023-12-22 11:55:56,078 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 11:55:56,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=571920.0, ans=0.2 2023-12-22 11:55:59,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=571920.0, ans=0.125 2023-12-22 11:56:11,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-12-22 11:56:31,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=572120.0, ans=0.125 2023-12-22 11:56:42,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=572186.6666666666, ans=0.125 2023-12-22 11:56:46,850 INFO [train.py:886] (3/4) Epoch 19, batch 50, loss[loss=0.02092, audio_tagging_loss=0.02092, over 25000.00 frames. ], tot_loss[loss=0.02133, audio_tagging_loss=0.02133, over 1117748.93 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:56:47,742 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 3.056e+01 3.497e+01 4.227e+01 9.985e+01, threshold=6.993e+01, percent-clipped=7.0 2023-12-22 11:56:52,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=572253.3333333334, ans=0.125 2023-12-22 11:56:54,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=572253.3333333334, ans=0.125 2023-12-22 11:56:54,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=572253.3333333334, ans=0.2 2023-12-22 11:57:05,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=572320.0, ans=0.0 2023-12-22 11:57:15,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572386.6666666666, ans=0.125 2023-12-22 11:57:15,660 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 11:57:16,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=572453.3333333334, ans=0.0 2023-12-22 11:57:19,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=572453.3333333334, ans=0.05 2023-12-22 11:57:38,770 INFO [train.py:886] (3/4) Epoch 19, batch 100, loss[loss=0.01336, audio_tagging_loss=0.01336, over 25000.00 frames. ], tot_loss[loss=0.01859, audio_tagging_loss=0.01859, over 1974694.17 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:57:39,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-12-22 11:57:40,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=572586.6666666666, ans=0.05 2023-12-22 11:57:57,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2023-12-22 11:58:00,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=572720.0, ans=0.125 2023-12-22 11:58:12,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572786.6666666666, ans=0.125 2023-12-22 11:58:14,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=572786.6666666666, ans=0.09899494936611666 2023-12-22 11:58:18,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=572853.3333333334, ans=0.0 2023-12-22 11:58:30,338 INFO [train.py:886] (3/4) Epoch 19, batch 150, loss[loss=0.0154, audio_tagging_loss=0.0154, over 25000.00 frames. ], tot_loss[loss=0.01711, audio_tagging_loss=0.01711, over 2637072.30 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:58:31,284 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.036e+01 3.224e+01 3.385e+01 4.253e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 11:58:40,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-12-22 11:58:42,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=572986.6666666666, ans=0.2 2023-12-22 11:59:01,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573120.0, ans=0.1 2023-12-22 11:59:02,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-12-22 11:59:06,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=15.0 2023-12-22 11:59:09,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-12-22 11:59:16,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=573186.6666666666, ans=0.0 2023-12-22 11:59:17,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=573186.6666666666, ans=0.125 2023-12-22 11:59:21,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=573253.3333333334, ans=0.125 2023-12-22 11:59:21,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=12.0 2023-12-22 11:59:22,112 INFO [train.py:886] (3/4) Epoch 19, batch 200, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01605, audio_tagging_loss=0.01605, over 3151831.86 frames. ], batch size: 100, lr: 5.80e-03, grad_scale: 64.0 2023-12-22 11:59:27,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=573253.3333333334, ans=0.0 2023-12-22 11:59:32,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=573320.0, ans=0.125 2023-12-22 11:59:32,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=573320.0, ans=0.04949747468305833 2023-12-22 11:59:48,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-12-22 11:59:49,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-12-22 11:59:58,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=573453.3333333334, ans=10.0 2023-12-22 12:00:08,169 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.627e-02 2023-12-22 12:00:13,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573586.6666666666, ans=0.125 2023-12-22 12:00:14,374 INFO [train.py:886] (3/4) Epoch 19, batch 250, loss[loss=0.01599, audio_tagging_loss=0.01599, over 25000.00 frames. ], tot_loss[loss=0.01544, audio_tagging_loss=0.01544, over 3554192.88 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:00:15,316 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.866e+01 3.015e+01 3.184e+01 3.683e+01, threshold=6.030e+01, percent-clipped=0.0 2023-12-22 12:00:22,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-12-22 12:00:52,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-12-22 12:01:00,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=573853.3333333334, ans=0.0 2023-12-22 12:01:01,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-12-22 12:01:04,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=573920.0, ans=0.125 2023-12-22 12:01:06,091 INFO [train.py:886] (3/4) Epoch 19, batch 300, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01515, audio_tagging_loss=0.01515, over 3864974.27 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:14,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573920.0, ans=0.1 2023-12-22 12:01:27,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-12-22 12:01:38,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=574120.0, ans=0.125 2023-12-22 12:01:51,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=574186.6666666666, ans=0.125 2023-12-22 12:01:57,898 INFO [train.py:886] (3/4) Epoch 19, batch 350, loss[loss=0.01497, audio_tagging_loss=0.01497, over 24750.00 frames. ], tot_loss[loss=0.015, audio_tagging_loss=0.015, over 4104521.28 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:01:58,802 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.631e+01 2.896e+01 3.022e+01 3.147e+01 3.983e+01, threshold=6.044e+01, percent-clipped=0.0 2023-12-22 12:02:04,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=574253.3333333334, ans=0.0 2023-12-22 12:02:26,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=574386.6666666666, ans=0.1 2023-12-22 12:02:27,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=574386.6666666666, ans=0.0 2023-12-22 12:02:43,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=574520.0, ans=0.125 2023-12-22 12:02:50,300 INFO [train.py:886] (3/4) Epoch 19, batch 400, loss[loss=0.01516, audio_tagging_loss=0.01516, over 24750.00 frames. ], tot_loss[loss=0.01474, audio_tagging_loss=0.01474, over 4292931.11 frames. ], batch size: 99, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:02:54,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=574586.6666666666, ans=0.0 2023-12-22 12:03:00,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574653.3333333334, ans=0.1 2023-12-22 12:03:04,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.21 vs. limit=10.0 2023-12-22 12:03:23,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574786.6666666666, ans=0.1 2023-12-22 12:03:25,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=574786.6666666666, ans=0.125 2023-12-22 12:03:32,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2023-12-22 12:03:39,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2023-12-22 12:03:42,116 INFO [train.py:886] (3/4) Epoch 19, batch 450, loss[loss=0.01258, audio_tagging_loss=0.01258, over 22422.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 4439276.83 frames. ], batch size: 107, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:03:43,022 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.821e+01 2.989e+01 3.118e+01 4.084e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:03:50,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=574920.0, ans=0.0 2023-12-22 12:04:04,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=575053.3333333334, ans=0.125 2023-12-22 12:04:12,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=575120.0, ans=15.0 2023-12-22 12:04:12,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=575120.0, ans=0.0 2023-12-22 12:04:12,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=575120.0, ans=0.125 2023-12-22 12:04:20,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=575120.0, ans=0.0 2023-12-22 12:04:33,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=575253.3333333334, ans=0.09899494936611666 2023-12-22 12:04:34,407 INFO [train.py:886] (3/4) Epoch 19, batch 500, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01424, audio_tagging_loss=0.01424, over 4553576.98 frames. ], batch size: 100, lr: 5.79e-03, grad_scale: 64.0 2023-12-22 12:04:40,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=575253.3333333334, ans=0.1 2023-12-22 12:04:49,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-12-22 12:05:10,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=575453.3333333334, ans=0.125 2023-12-22 12:05:20,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575520.0, ans=0.1 2023-12-22 12:05:25,809 INFO [train.py:886] (3/4) Epoch 19, batch 550, loss[loss=0.01607, audio_tagging_loss=0.01607, over 25000.00 frames. ], tot_loss[loss=0.01423, audio_tagging_loss=0.01423, over 4643040.09 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:05:26,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=575586.6666666666, ans=0.1 2023-12-22 12:05:27,457 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.644e+01 2.851e+01 2.999e+01 3.143e+01 3.549e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:05:30,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=575586.6666666666, ans=0.0 2023-12-22 12:05:33,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=575586.6666666666, ans=0.0 2023-12-22 12:05:37,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=575653.3333333334, ans=0.0 2023-12-22 12:05:48,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=575720.0, ans=0.0 2023-12-22 12:05:52,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-12-22 12:06:04,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=575786.6666666666, ans=0.2 2023-12-22 12:06:05,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2023-12-22 12:06:17,524 INFO [train.py:886] (3/4) Epoch 19, batch 600, loss[loss=0.01729, audio_tagging_loss=0.01729, over 24750.00 frames. ], tot_loss[loss=0.01425, audio_tagging_loss=0.01425, over 4704159.31 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:06:20,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2023-12-22 12:06:24,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575920.0, ans=0.0 2023-12-22 12:06:35,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575986.6666666666, ans=0.125 2023-12-22 12:06:41,827 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:07:10,147 INFO [train.py:886] (3/4) Epoch 19, batch 650, loss[loss=0.009815, audio_tagging_loss=0.009815, over 24750.00 frames. ], tot_loss[loss=0.0142, audio_tagging_loss=0.0142, over 4751924.28 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:07:11,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 2.907e+01 3.049e+01 3.128e+01 3.545e+01, threshold=6.097e+01, percent-clipped=0.0 2023-12-22 12:07:18,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=576253.3333333334, ans=0.125 2023-12-22 12:07:42,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-12-22 12:08:01,297 INFO [train.py:886] (3/4) Epoch 19, batch 700, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01408, audio_tagging_loss=0.01408, over 4790280.98 frames. ], batch size: 99, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:04,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=15.0 2023-12-22 12:08:10,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=576586.6666666666, ans=0.2 2023-12-22 12:08:11,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=576653.3333333334, ans=0.09899494936611666 2023-12-22 12:08:25,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=576720.0, ans=0.125 2023-12-22 12:08:26,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=576720.0, ans=0.125 2023-12-22 12:08:51,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=576853.3333333334, ans=0.2 2023-12-22 12:08:53,210 INFO [train.py:886] (3/4) Epoch 19, batch 750, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4825601.75 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:08:54,163 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.842e+01 3.000e+01 3.145e+01 3.640e+01, threshold=6.000e+01, percent-clipped=0.0 2023-12-22 12:08:58,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=576920.0, ans=0.0 2023-12-22 12:09:01,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=576920.0, ans=0.09899494936611666 2023-12-22 12:09:19,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=577053.3333333334, ans=0.5 2023-12-22 12:09:25,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=577120.0, ans=0.125 2023-12-22 12:09:44,886 INFO [train.py:886] (3/4) Epoch 19, batch 800, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.014, audio_tagging_loss=0.014, over 4857530.39 frames. ], batch size: 100, lr: 5.78e-03, grad_scale: 64.0 2023-12-22 12:09:51,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577253.3333333334, ans=0.125 2023-12-22 12:10:03,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=577320.0, ans=0.0 2023-12-22 12:10:05,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-22 12:10:13,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-22 12:10:25,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=577453.3333333334, ans=0.07 2023-12-22 12:10:36,759 INFO [train.py:886] (3/4) Epoch 19, batch 850, loss[loss=0.01596, audio_tagging_loss=0.01596, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4881002.91 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:10:37,688 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.538e+01 2.855e+01 2.981e+01 3.141e+01 3.623e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 12:10:40,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=577586.6666666666, ans=0.0 2023-12-22 12:10:45,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-12-22 12:10:46,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-12-22 12:10:50,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=577653.3333333334, ans=0.125 2023-12-22 12:11:06,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=577720.0, ans=0.025 2023-12-22 12:11:14,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-22 12:11:16,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=577786.6666666666, ans=0.125 2023-12-22 12:11:29,931 INFO [train.py:886] (3/4) Epoch 19, batch 900, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4895411.44 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:11:34,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=577920.0, ans=0.125 2023-12-22 12:11:35,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=577920.0, ans=0.125 2023-12-22 12:11:39,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=577986.6666666666, ans=0.125 2023-12-22 12:11:45,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=577986.6666666666, ans=0.0 2023-12-22 12:11:49,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=578053.3333333334, ans=0.0 2023-12-22 12:11:49,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=578053.3333333334, ans=0.2 2023-12-22 12:11:50,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=578053.3333333334, ans=0.0 2023-12-22 12:11:50,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=578053.3333333334, ans=0.125 2023-12-22 12:12:02,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2023-12-22 12:12:17,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.74 vs. limit=10.0 2023-12-22 12:12:19,887 INFO [train.py:886] (3/4) Epoch 19, batch 950, loss[loss=0.01619, audio_tagging_loss=0.01619, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4907224.88 frames. ], batch size: 99, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:12:20,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+01 2.852e+01 2.989e+01 3.124e+01 4.073e+01, threshold=5.979e+01, percent-clipped=0.0 2023-12-22 12:12:29,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=578320.0, ans=0.125 2023-12-22 12:12:32,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=578320.0, ans=0.125 2023-12-22 12:12:38,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=578320.0, ans=0.2 2023-12-22 12:12:49,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=578453.3333333334, ans=0.2 2023-12-22 12:12:54,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-22 12:13:11,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=578586.6666666666, ans=0.015 2023-12-22 12:13:11,945 INFO [train.py:886] (3/4) Epoch 19, batch 1000, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4910424.48 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:13:18,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=578586.6666666666, ans=0.125 2023-12-22 12:13:33,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578720.0, ans=0.1 2023-12-22 12:13:53,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=578853.3333333334, ans=0.125 2023-12-22 12:13:54,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=578853.3333333334, ans=0.125 2023-12-22 12:14:04,560 INFO [train.py:886] (3/4) Epoch 19, batch 1050, loss[loss=0.01361, audio_tagging_loss=0.01361, over 24186.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4916491.65 frames. ], batch size: 101, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:05,502 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 2.856e+01 2.980e+01 3.114e+01 3.633e+01, threshold=5.960e+01, percent-clipped=0.0 2023-12-22 12:14:23,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=579053.3333333334, ans=0.5 2023-12-22 12:14:37,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=579120.0, ans=0.015 2023-12-22 12:14:46,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=579186.6666666666, ans=0.125 2023-12-22 12:14:47,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=579186.6666666666, ans=0.0 2023-12-22 12:14:55,539 INFO [train.py:886] (3/4) Epoch 19, batch 1100, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4926100.29 frames. ], batch size: 100, lr: 5.77e-03, grad_scale: 64.0 2023-12-22 12:14:55,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-12-22 12:15:11,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=579320.0, ans=0.0 2023-12-22 12:15:27,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-12-22 12:15:41,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=12.0 2023-12-22 12:15:47,811 INFO [train.py:886] (3/4) Epoch 19, batch 1150, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4936502.17 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:15:49,407 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+01 2.855e+01 2.981e+01 3.079e+01 3.493e+01, threshold=5.963e+01, percent-clipped=0.0 2023-12-22 12:16:02,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=579653.3333333334, ans=0.015 2023-12-22 12:16:18,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=579786.6666666666, ans=0.125 2023-12-22 12:16:21,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.04 vs. limit=22.5 2023-12-22 12:16:34,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579853.3333333334, ans=0.1 2023-12-22 12:16:34,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-12-22 12:16:39,353 INFO [train.py:886] (3/4) Epoch 19, batch 1200, loss[loss=0.01503, audio_tagging_loss=0.01503, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4942316.71 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:17:23,871 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.306e-02 2023-12-22 12:17:31,940 INFO [train.py:886] (3/4) Epoch 19, batch 1250, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4939730.88 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:17:32,877 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.968e+01 3.088e+01 3.270e+01 3.810e+01, threshold=6.176e+01, percent-clipped=0.0 2023-12-22 12:17:46,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=580320.0, ans=0.025 2023-12-22 12:17:57,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580386.6666666666, ans=0.1 2023-12-22 12:17:59,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2023-12-22 12:18:02,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=580453.3333333334, ans=0.0 2023-12-22 12:18:24,375 INFO [train.py:886] (3/4) Epoch 19, batch 1300, loss[loss=0.01611, audio_tagging_loss=0.01611, over 24750.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4941054.70 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:18:39,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=580653.3333333334, ans=0.04949747468305833 2023-12-22 12:18:42,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=580653.3333333334, ans=0.0 2023-12-22 12:18:47,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=580720.0, ans=0.125 2023-12-22 12:19:15,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=580920.0, ans=0.0 2023-12-22 12:19:16,595 INFO [train.py:886] (3/4) Epoch 19, batch 1350, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4937487.45 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:19:17,517 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+01 2.933e+01 3.068e+01 3.210e+01 4.169e+01, threshold=6.137e+01, percent-clipped=0.0 2023-12-22 12:19:24,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=580920.0, ans=0.2 2023-12-22 12:19:24,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580920.0, ans=0.1 2023-12-22 12:19:56,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=581120.0, ans=0.125 2023-12-22 12:19:59,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=581186.6666666666, ans=0.2 2023-12-22 12:20:08,143 INFO [train.py:886] (3/4) Epoch 19, batch 1400, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4940352.58 frames. ], batch size: 100, lr: 5.76e-03, grad_scale: 64.0 2023-12-22 12:20:14,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581253.3333333334, ans=0.125 2023-12-22 12:20:17,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2023-12-22 12:20:46,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=581453.3333333334, ans=0.125 2023-12-22 12:21:00,428 INFO [train.py:886] (3/4) Epoch 19, batch 1450, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4941730.13 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:21:01,343 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 2.831e+01 2.949e+01 3.105e+01 4.075e+01, threshold=5.898e+01, percent-clipped=0.0 2023-12-22 12:21:02,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=581586.6666666666, ans=0.125 2023-12-22 12:21:08,114 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:21:12,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=581653.3333333334, ans=0.125 2023-12-22 12:21:24,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=581720.0, ans=0.0 2023-12-22 12:21:32,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-12-22 12:21:32,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=581786.6666666666, ans=0.0 2023-12-22 12:21:35,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=581786.6666666666, ans=0.125 2023-12-22 12:21:36,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-12-22 12:21:38,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581786.6666666666, ans=0.1 2023-12-22 12:21:40,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=581786.6666666666, ans=0.0 2023-12-22 12:21:44,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=581853.3333333334, ans=0.125 2023-12-22 12:21:51,572 INFO [train.py:886] (3/4) Epoch 19, batch 1500, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4945683.38 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:12,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=582053.3333333334, ans=0.0 2023-12-22 12:22:22,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=582120.0, ans=0.125 2023-12-22 12:22:28,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=582120.0, ans=0.125 2023-12-22 12:22:32,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2023-12-22 12:22:40,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=582186.6666666666, ans=0.2 2023-12-22 12:22:44,300 INFO [train.py:886] (3/4) Epoch 19, batch 1550, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4945590.00 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:22:45,195 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.569e+01 2.901e+01 3.016e+01 3.205e+01 3.562e+01, threshold=6.032e+01, percent-clipped=0.0 2023-12-22 12:22:48,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=582253.3333333334, ans=0.125 2023-12-22 12:22:49,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.17 vs. limit=10.0 2023-12-22 12:22:52,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2023-12-22 12:23:01,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=582320.0, ans=0.125 2023-12-22 12:23:28,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-12-22 12:23:29,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=582520.0, ans=0.0 2023-12-22 12:23:35,102 INFO [train.py:886] (3/4) Epoch 19, batch 1600, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4939512.57 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:23:35,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582586.6666666666, ans=0.1 2023-12-22 12:23:36,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=582586.6666666666, ans=0.125 2023-12-22 12:23:49,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582653.3333333334, ans=0.125 2023-12-22 12:23:50,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2023-12-22 12:23:55,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=582720.0, ans=0.0 2023-12-22 12:24:04,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582720.0, ans=0.1 2023-12-22 12:24:25,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=12.0 2023-12-22 12:24:26,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=582920.0, ans=0.0 2023-12-22 12:24:26,979 INFO [train.py:886] (3/4) Epoch 19, batch 1650, loss[loss=0.01561, audio_tagging_loss=0.01561, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4944125.64 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:24:27,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.591e+01 2.832e+01 3.010e+01 3.174e+01 4.519e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 12:24:42,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582986.6666666666, ans=0.1 2023-12-22 12:24:42,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582986.6666666666, ans=0.1 2023-12-22 12:24:43,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=582986.6666666666, ans=0.125 2023-12-22 12:24:43,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=582986.6666666666, ans=0.125 2023-12-22 12:25:00,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583120.0, ans=0.1 2023-12-22 12:25:02,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-12-22 12:25:03,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=583120.0, ans=0.125 2023-12-22 12:25:14,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=583186.6666666666, ans=0.125 2023-12-22 12:25:20,025 INFO [train.py:886] (3/4) Epoch 19, batch 1700, loss[loss=0.015, audio_tagging_loss=0.015, over 24014.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4945161.85 frames. ], batch size: 100, lr: 5.75e-03, grad_scale: 64.0 2023-12-22 12:25:23,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=583253.3333333334, ans=0.125 2023-12-22 12:25:25,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-12-22 12:25:26,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=10.0 2023-12-22 12:25:32,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=583320.0, ans=0.0 2023-12-22 12:26:10,341 INFO [train.py:886] (3/4) Epoch 19, batch 1750, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4945808.87 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:26:11,957 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+01 2.867e+01 2.991e+01 3.170e+01 3.971e+01, threshold=5.982e+01, percent-clipped=0.0 2023-12-22 12:26:28,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=583653.3333333334, ans=0.125 2023-12-22 12:26:33,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=583720.0, ans=0.0 2023-12-22 12:26:34,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-12-22 12:26:44,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 12:26:50,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=583786.6666666666, ans=0.2 2023-12-22 12:27:01,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-12-22 12:27:01,889 INFO [train.py:886] (3/4) Epoch 19, batch 1800, loss[loss=0.01623, audio_tagging_loss=0.01623, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4955232.17 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:07,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=583920.0, ans=0.0 2023-12-22 12:27:35,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=584120.0, ans=0.09899494936611666 2023-12-22 12:27:42,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=584186.6666666666, ans=0.125 2023-12-22 12:27:54,621 INFO [train.py:886] (3/4) Epoch 19, batch 1850, loss[loss=0.01356, audio_tagging_loss=0.01356, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4959944.45 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:27:55,542 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+01 2.889e+01 3.007e+01 3.145e+01 3.702e+01, threshold=6.015e+01, percent-clipped=0.0 2023-12-22 12:27:58,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=584253.3333333334, ans=0.07 2023-12-22 12:28:00,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-22 12:28:38,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=584520.0, ans=0.125 2023-12-22 12:28:46,486 INFO [train.py:886] (3/4) Epoch 19, batch 1900, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4954747.86 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:28:55,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=584653.3333333334, ans=0.125 2023-12-22 12:29:08,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-22 12:29:25,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584786.6666666666, ans=0.1 2023-12-22 12:29:30,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 12:29:32,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.03 vs. limit=10.0 2023-12-22 12:29:39,772 INFO [train.py:886] (3/4) Epoch 19, batch 1950, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4948088.40 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:29:40,703 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.861e+01 2.979e+01 3.122e+01 3.684e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:29:55,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=584986.6666666666, ans=0.0 2023-12-22 12:29:59,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=585053.3333333334, ans=0.125 2023-12-22 12:30:09,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=585053.3333333334, ans=0.0 2023-12-22 12:30:19,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=585186.6666666666, ans=0.125 2023-12-22 12:30:30,751 INFO [train.py:886] (3/4) Epoch 19, batch 2000, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4953008.29 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 64.0 2023-12-22 12:30:34,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=585253.3333333334, ans=0.125 2023-12-22 12:30:46,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=585320.0, ans=0.125 2023-12-22 12:31:03,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=585453.3333333334, ans=0.125 2023-12-22 12:31:23,488 INFO [train.py:886] (3/4) Epoch 19, batch 2050, loss[loss=0.01201, audio_tagging_loss=0.01201, over 21434.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4954712.52 frames. ], batch size: 107, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:31:24,384 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.887e+01 3.027e+01 3.183e+01 3.649e+01, threshold=6.055e+01, percent-clipped=0.0 2023-12-22 12:31:41,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585653.3333333334, ans=0.125 2023-12-22 12:31:44,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585720.0, ans=0.125 2023-12-22 12:31:51,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=585720.0, ans=0.2 2023-12-22 12:31:57,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=585786.6666666666, ans=0.2 2023-12-22 12:32:06,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585853.3333333334, ans=0.1 2023-12-22 12:32:12,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:32:15,257 INFO [train.py:886] (3/4) Epoch 19, batch 2100, loss[loss=0.01164, audio_tagging_loss=0.01164, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4959319.95 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:32:18,059 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.466e-03 2023-12-22 12:32:23,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585920.0, ans=0.0 2023-12-22 12:33:06,962 INFO [train.py:886] (3/4) Epoch 19, batch 2150, loss[loss=0.01785, audio_tagging_loss=0.01785, over 24944.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4959005.39 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:33:07,900 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 2.855e+01 2.979e+01 3.117e+01 3.544e+01, threshold=5.958e+01, percent-clipped=0.0 2023-12-22 12:33:11,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=586253.3333333334, ans=0.0 2023-12-22 12:33:22,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586320.0, ans=0.1 2023-12-22 12:33:24,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-12-22 12:33:36,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-12-22 12:33:41,044 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:33:41,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=586453.3333333334, ans=0.125 2023-12-22 12:33:53,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2023-12-22 12:33:59,117 INFO [train.py:886] (3/4) Epoch 19, batch 2200, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4955571.33 frames. ], batch size: 99, lr: 5.73e-03, grad_scale: 128.0 2023-12-22 12:34:07,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-22 12:34:21,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=586720.0, ans=0.125 2023-12-22 12:34:32,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-12-22 12:34:35,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-12-22 12:34:42,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=586853.3333333334, ans=10.0 2023-12-22 12:34:50,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2023-12-22 12:34:51,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=586853.3333333334, ans=0.0 2023-12-22 12:34:53,662 INFO [train.py:886] (3/4) Epoch 19, batch 2250, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4952065.11 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:34:55,533 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.899e+01 3.031e+01 3.221e+01 3.742e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 12:35:01,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2023-12-22 12:35:09,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-12-22 12:35:37,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=587186.6666666666, ans=0.0 2023-12-22 12:35:37,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=587186.6666666666, ans=0.125 2023-12-22 12:35:45,207 INFO [train.py:886] (3/4) Epoch 19, batch 2300, loss[loss=0.01555, audio_tagging_loss=0.01555, over 22705.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4950106.67 frames. ], batch size: 107, lr: 5.73e-03, grad_scale: 64.0 2023-12-22 12:35:52,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=587253.3333333334, ans=0.0 2023-12-22 12:36:11,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=587386.6666666666, ans=0.125 2023-12-22 12:36:11,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=587386.6666666666, ans=0.125 2023-12-22 12:36:17,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-12-22 12:36:37,597 INFO [train.py:886] (3/4) Epoch 19, batch 2350, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4953528.04 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:36:39,497 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.873e+01 2.999e+01 3.144e+01 3.595e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 12:36:44,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=587586.6666666666, ans=0.0 2023-12-22 12:37:20,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=587853.3333333334, ans=0.2 2023-12-22 12:37:29,345 INFO [train.py:886] (3/4) Epoch 19, batch 2400, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4955638.74 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:37:35,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587920.0, ans=0.1 2023-12-22 12:38:00,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=588120.0, ans=0.0 2023-12-22 12:38:04,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=588120.0, ans=0.125 2023-12-22 12:38:20,928 INFO [train.py:886] (3/4) Epoch 19, batch 2450, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4953580.23 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:38:22,761 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 2.869e+01 3.044e+01 3.154e+01 3.723e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 12:38:22,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=588253.3333333334, ans=0.125 2023-12-22 12:38:35,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=588320.0, ans=0.0 2023-12-22 12:38:52,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=588453.3333333334, ans=0.125 2023-12-22 12:39:08,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-12-22 12:39:13,118 INFO [train.py:886] (3/4) Epoch 19, batch 2500, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4946268.69 frames. ], batch size: 99, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:39:17,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-12-22 12:39:33,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2023-12-22 12:39:36,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2023-12-22 12:40:03,873 INFO [train.py:886] (3/4) Epoch 19, batch 2550, loss[loss=0.01386, audio_tagging_loss=0.01386, over 24750.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4945392.43 frames. ], batch size: 99, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:40:04,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588920.0, ans=0.1 2023-12-22 12:40:06,709 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.976e+01 3.078e+01 3.235e+01 3.985e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 12:40:22,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=12.0 2023-12-22 12:40:32,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=589053.3333333334, ans=0.2 2023-12-22 12:40:52,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-22 12:40:56,966 INFO [train.py:886] (3/4) Epoch 19, batch 2600, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 4949913.01 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 64.0 2023-12-22 12:41:01,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589253.3333333334, ans=0.125 2023-12-22 12:41:48,938 INFO [train.py:886] (3/4) Epoch 19, batch 2650, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4950284.96 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:41:51,499 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.856e+01 2.983e+01 3.159e+01 3.716e+01, threshold=5.966e+01, percent-clipped=0.0 2023-12-22 12:41:59,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589653.3333333334, ans=0.0 2023-12-22 12:42:11,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=589720.0, ans=0.125 2023-12-22 12:42:25,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=589786.6666666666, ans=0.0 2023-12-22 12:42:37,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589853.3333333334, ans=0.125 2023-12-22 12:42:41,036 INFO [train.py:886] (3/4) Epoch 19, batch 2700, loss[loss=0.01472, audio_tagging_loss=0.01472, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4947871.32 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:43:01,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=589986.6666666666, ans=0.5 2023-12-22 12:43:11,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590120.0, ans=0.1 2023-12-22 12:43:12,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=590120.0, ans=0.125 2023-12-22 12:43:20,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.07 vs. limit=10.0 2023-12-22 12:43:26,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590186.6666666666, ans=0.125 2023-12-22 12:43:33,967 INFO [train.py:886] (3/4) Epoch 19, batch 2750, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4957411.80 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:43:34,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590253.3333333334, ans=0.1 2023-12-22 12:43:35,870 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.856e+01 3.011e+01 3.165e+01 3.589e+01, threshold=6.021e+01, percent-clipped=0.0 2023-12-22 12:43:38,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=590253.3333333334, ans=0.2 2023-12-22 12:43:41,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=22.5 2023-12-22 12:43:50,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-12-22 12:43:59,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-12-22 12:44:24,030 INFO [train.py:886] (3/4) Epoch 19, batch 2800, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4955674.40 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:45:07,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=590853.3333333334, ans=0.125 2023-12-22 12:45:16,444 INFO [train.py:886] (3/4) Epoch 19, batch 2850, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4945833.98 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:45:18,395 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.937e+01 3.059e+01 3.223e+01 3.901e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 12:45:19,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590920.0, ans=0.1 2023-12-22 12:45:40,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=591053.3333333334, ans=0.0 2023-12-22 12:45:43,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591053.3333333334, ans=0.1 2023-12-22 12:45:48,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=591120.0, ans=0.05 2023-12-22 12:45:54,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591120.0, ans=0.1 2023-12-22 12:46:05,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-12-22 12:46:08,814 INFO [train.py:886] (3/4) Epoch 19, batch 2900, loss[loss=0.01319, audio_tagging_loss=0.01319, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4941630.36 frames. ], batch size: 99, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:46:18,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=591320.0, ans=0.0 2023-12-22 12:46:24,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591320.0, ans=0.1 2023-12-22 12:46:29,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2023-12-22 12:46:31,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=591386.6666666666, ans=0.0 2023-12-22 12:46:32,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-12-22 12:46:49,916 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.175e-03 2023-12-22 12:47:00,325 INFO [train.py:886] (3/4) Epoch 19, batch 2950, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4938184.20 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 64.0 2023-12-22 12:47:00,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=591586.6666666666, ans=10.0 2023-12-22 12:47:02,195 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 2.834e+01 2.936e+01 3.112e+01 3.649e+01, threshold=5.872e+01, percent-clipped=0.0 2023-12-22 12:47:33,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=591786.6666666666, ans=0.125 2023-12-22 12:47:52,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-12-22 12:47:54,103 INFO [train.py:886] (3/4) Epoch 19, batch 3000, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4940114.43 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:47:54,103 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 12:48:15,446 INFO [train.py:917] (3/4) Epoch 19, validation: loss=0.0333, audio_tagging_loss=0.0333, over 3737520.00 frames. 2023-12-22 12:48:15,447 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 12:48:15,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=591920.0, ans=0.125 2023-12-22 12:48:32,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-12-22 12:49:02,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=592186.6666666666, ans=0.2 2023-12-22 12:49:04,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-12-22 12:49:06,691 INFO [train.py:886] (3/4) Epoch 19, batch 3050, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4944810.68 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:49:07,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2023-12-22 12:49:08,539 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.886e+01 3.017e+01 3.136e+01 3.625e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 12:49:09,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=592253.3333333334, ans=0.2 2023-12-22 12:49:31,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=592386.6666666666, ans=0.0 2023-12-22 12:49:33,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-12-22 12:49:35,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=592386.6666666666, ans=0.0 2023-12-22 12:49:37,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=592453.3333333334, ans=0.125 2023-12-22 12:49:37,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.89 vs. limit=22.5 2023-12-22 12:49:39,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-22 12:49:45,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=592453.3333333334, ans=0.0 2023-12-22 12:49:48,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=12.0 2023-12-22 12:49:59,758 INFO [train.py:886] (3/4) Epoch 19, batch 3100, loss[loss=0.01593, audio_tagging_loss=0.01593, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4949606.55 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:49:59,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=592586.6666666666, ans=0.125 2023-12-22 12:50:05,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592586.6666666666, ans=0.1 2023-12-22 12:50:10,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.01 vs. limit=22.5 2023-12-22 12:50:12,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=592653.3333333334, ans=0.125 2023-12-22 12:50:15,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=592653.3333333334, ans=0.125 2023-12-22 12:50:19,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=592720.0, ans=0.125 2023-12-22 12:50:20,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=592720.0, ans=0.0 2023-12-22 12:50:41,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=592853.3333333334, ans=0.125 2023-12-22 12:50:50,365 INFO [train.py:886] (3/4) Epoch 19, batch 3150, loss[loss=0.01579, audio_tagging_loss=0.01579, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4947849.11 frames. ], batch size: 99, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:50:50,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=592920.0, ans=0.0 2023-12-22 12:50:52,268 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+01 2.871e+01 2.985e+01 3.126e+01 3.979e+01, threshold=5.970e+01, percent-clipped=0.0 2023-12-22 12:50:55,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=592920.0, ans=0.2 2023-12-22 12:50:59,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=592920.0, ans=0.2 2023-12-22 12:51:05,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=592986.6666666666, ans=0.125 2023-12-22 12:51:15,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-12-22 12:51:23,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=593120.0, ans=0.125 2023-12-22 12:51:33,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=593186.6666666666, ans=0.125 2023-12-22 12:51:42,994 INFO [train.py:886] (3/4) Epoch 19, batch 3200, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4943705.48 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:51:49,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=593253.3333333334, ans=0.125 2023-12-22 12:51:55,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-12-22 12:51:55,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-12-22 12:52:35,662 INFO [train.py:886] (3/4) Epoch 19, batch 3250, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4952128.16 frames. ], batch size: 100, lr: 5.70e-03, grad_scale: 64.0 2023-12-22 12:52:37,604 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.874e+01 3.008e+01 3.221e+01 3.622e+01, threshold=6.016e+01, percent-clipped=0.0 2023-12-22 12:52:38,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=593586.6666666666, ans=0.2 2023-12-22 12:52:40,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2023-12-22 12:52:45,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2023-12-22 12:52:54,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=593720.0, ans=0.125 2023-12-22 12:52:58,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=593720.0, ans=0.125 2023-12-22 12:53:03,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=593720.0, ans=0.0 2023-12-22 12:53:11,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=12.0 2023-12-22 12:53:13,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593786.6666666666, ans=0.1 2023-12-22 12:53:15,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-12-22 12:53:27,372 INFO [train.py:886] (3/4) Epoch 19, batch 3300, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4949163.03 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:53:44,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=593986.6666666666, ans=0.2 2023-12-22 12:53:48,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594053.3333333334, ans=0.125 2023-12-22 12:53:57,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=594120.0, ans=0.2 2023-12-22 12:53:57,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=594120.0, ans=0.2 2023-12-22 12:54:09,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=594186.6666666666, ans=0.125 2023-12-22 12:54:15,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=594186.6666666666, ans=0.125 2023-12-22 12:54:19,662 INFO [train.py:886] (3/4) Epoch 19, batch 3350, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4951381.47 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:54:20,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=594253.3333333334, ans=0.0 2023-12-22 12:54:21,564 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+01 2.848e+01 3.000e+01 3.148e+01 3.687e+01, threshold=5.999e+01, percent-clipped=0.0 2023-12-22 12:54:21,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=594253.3333333334, ans=0.0 2023-12-22 12:54:30,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-12-22 12:54:33,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=594320.0, ans=0.0 2023-12-22 12:54:42,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=594386.6666666666, ans=0.1 2023-12-22 12:55:10,457 INFO [train.py:886] (3/4) Epoch 19, batch 3400, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4953607.76 frames. ], batch size: 100, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:55:18,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=594586.6666666666, ans=0.0 2023-12-22 12:55:21,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-12-22 12:55:48,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=594786.6666666666, ans=0.125 2023-12-22 12:55:58,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-12-22 12:56:03,787 INFO [train.py:886] (3/4) Epoch 19, batch 3450, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4945346.22 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:56:05,663 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.881e+01 2.999e+01 3.150e+01 3.664e+01, threshold=5.998e+01, percent-clipped=0.0 2023-12-22 12:56:29,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=595053.3333333334, ans=0.125 2023-12-22 12:56:31,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-22 12:56:34,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=595120.0, ans=0.2 2023-12-22 12:56:42,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=595120.0, ans=0.0 2023-12-22 12:56:55,973 INFO [train.py:886] (3/4) Epoch 19, batch 3500, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4942038.23 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:56:56,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=595253.3333333334, ans=0.125 2023-12-22 12:57:02,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=595253.3333333334, ans=0.2 2023-12-22 12:57:02,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=595253.3333333334, ans=0.125 2023-12-22 12:57:05,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=595320.0, ans=0.0 2023-12-22 12:57:15,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=595386.6666666666, ans=0.2 2023-12-22 12:57:23,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=595386.6666666666, ans=0.125 2023-12-22 12:57:46,945 INFO [train.py:886] (3/4) Epoch 19, batch 3550, loss[loss=0.01511, audio_tagging_loss=0.01511, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4941570.14 frames. ], batch size: 99, lr: 5.69e-03, grad_scale: 64.0 2023-12-22 12:57:49,600 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.876e+01 3.031e+01 3.189e+01 3.844e+01, threshold=6.062e+01, percent-clipped=0.0 2023-12-22 12:57:53,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2023-12-22 12:58:01,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2023-12-22 12:58:19,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-12-22 12:58:23,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=595786.6666666666, ans=0.125 2023-12-22 12:58:29,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-12-22 12:58:39,947 INFO [train.py:886] (3/4) Epoch 19, batch 3600, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4944878.26 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:58:40,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=595920.0, ans=0.05 2023-12-22 12:58:43,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595920.0, ans=0.1 2023-12-22 12:58:53,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595986.6666666666, ans=0.1 2023-12-22 12:58:56,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=595986.6666666666, ans=0.0 2023-12-22 12:59:04,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596053.3333333334, ans=0.1 2023-12-22 12:59:05,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-22 12:59:18,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=596120.0, ans=0.125 2023-12-22 12:59:32,397 INFO [train.py:886] (3/4) Epoch 19, batch 3650, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4943182.50 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 12:59:35,057 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.592e+01 2.803e+01 2.946e+01 3.056e+01 3.583e+01, threshold=5.891e+01, percent-clipped=0.0 2023-12-22 12:59:43,750 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 12:59:58,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-22 13:00:04,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 13:00:10,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596453.3333333334, ans=0.125 2023-12-22 13:00:15,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=596520.0, ans=0.125 2023-12-22 13:00:23,289 INFO [train.py:886] (3/4) Epoch 19, batch 3700, loss[loss=0.009958, audio_tagging_loss=0.009958, over 22197.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4945792.88 frames. ], batch size: 107, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:00:45,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=596720.0, ans=0.0 2023-12-22 13:01:15,892 INFO [train.py:886] (3/4) Epoch 19, batch 3750, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4946086.86 frames. ], batch size: 99, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:01:17,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.917e+01 3.055e+01 3.190e+01 3.624e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 13:01:30,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=596986.6666666666, ans=0.125 2023-12-22 13:01:33,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=596986.6666666666, ans=0.0 2023-12-22 13:01:39,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=597053.3333333334, ans=0.2 2023-12-22 13:01:41,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=597053.3333333334, ans=0.0 2023-12-22 13:01:51,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-22 13:02:01,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=597186.6666666666, ans=0.125 2023-12-22 13:02:06,085 INFO [train.py:886] (3/4) Epoch 19, batch 3800, loss[loss=0.01604, audio_tagging_loss=0.01604, over 24033.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4941421.14 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:07,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=597253.3333333334, ans=0.125 2023-12-22 13:02:14,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=597253.3333333334, ans=0.0 2023-12-22 13:02:25,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=597386.6666666666, ans=0.035 2023-12-22 13:02:57,556 INFO [train.py:886] (3/4) Epoch 19, batch 3850, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4940574.82 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:02:57,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=597586.6666666666, ans=0.125 2023-12-22 13:02:57,750 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:02:59,400 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 2.917e+01 3.058e+01 3.165e+01 3.564e+01, threshold=6.116e+01, percent-clipped=0.0 2023-12-22 13:03:00,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-12-22 13:03:06,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2023-12-22 13:03:08,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-12-22 13:03:29,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=597786.6666666666, ans=0.125 2023-12-22 13:03:33,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597786.6666666666, ans=0.1 2023-12-22 13:03:46,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=597853.3333333334, ans=0.125 2023-12-22 13:03:49,345 INFO [train.py:886] (3/4) Epoch 19, batch 3900, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4945805.20 frames. ], batch size: 100, lr: 5.68e-03, grad_scale: 64.0 2023-12-22 13:03:56,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2023-12-22 13:04:28,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2023-12-22 13:04:35,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-12-22 13:04:36,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=598186.6666666666, ans=0.125 2023-12-22 13:04:39,098 INFO [train.py:886] (3/4) Epoch 19, batch 3950, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4951216.77 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:04:40,997 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 2.903e+01 2.997e+01 3.155e+01 3.492e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 13:04:47,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=598253.3333333334, ans=0.0 2023-12-22 13:04:50,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=598320.0, ans=0.125 2023-12-22 13:04:55,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598320.0, ans=0.1 2023-12-22 13:05:01,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598386.6666666666, ans=0.1 2023-12-22 13:05:06,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=598386.6666666666, ans=0.0 2023-12-22 13:05:31,246 INFO [train.py:886] (3/4) Epoch 19, batch 4000, loss[loss=0.01564, audio_tagging_loss=0.01564, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4950775.05 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:05:41,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=598653.3333333334, ans=0.0 2023-12-22 13:05:41,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=598653.3333333334, ans=0.125 2023-12-22 13:05:56,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=598720.0, ans=0.125 2023-12-22 13:05:58,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-12-22 13:05:58,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=598720.0, ans=0.0 2023-12-22 13:06:00,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-12-22 13:06:06,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2023-12-22 13:06:07,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-12-22 13:06:18,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=598853.3333333334, ans=0.125 2023-12-22 13:06:21,982 INFO [train.py:886] (3/4) Epoch 19, batch 4050, loss[loss=0.01547, audio_tagging_loss=0.01547, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4954954.01 frames. ], batch size: 100, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:06:24,465 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.925e+01 3.047e+01 3.151e+01 3.558e+01, threshold=6.093e+01, percent-clipped=0.0 2023-12-22 13:06:41,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.55 vs. limit=22.5 2023-12-22 13:06:49,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=599053.3333333334, ans=0.125 2023-12-22 13:06:51,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-12-22 13:07:01,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-12-22 13:07:14,269 INFO [train.py:886] (3/4) Epoch 19, batch 4100, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4954815.72 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:07:25,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.37 vs. limit=10.0 2023-12-22 13:07:26,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=599320.0, ans=15.0 2023-12-22 13:07:41,267 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:07:45,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=599453.3333333334, ans=0.09899494936611666 2023-12-22 13:07:47,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599453.3333333334, ans=0.1 2023-12-22 13:07:56,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599520.0, ans=0.1 2023-12-22 13:07:58,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=599520.0, ans=0.0 2023-12-22 13:08:03,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=599520.0, ans=0.0 2023-12-22 13:08:06,031 INFO [train.py:886] (3/4) Epoch 19, batch 4150, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4951275.00 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:08:07,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 2.927e+01 3.054e+01 3.225e+01 3.796e+01, threshold=6.108e+01, percent-clipped=0.0 2023-12-22 13:08:14,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-12-22 13:08:18,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=599653.3333333334, ans=0.0 2023-12-22 13:08:26,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=599720.0, ans=0.5 2023-12-22 13:08:36,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=599786.6666666666, ans=0.07 2023-12-22 13:08:43,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=599786.6666666666, ans=0.2 2023-12-22 13:08:47,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.80 vs. limit=15.0 2023-12-22 13:08:53,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-12-22 13:08:54,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-22 13:08:55,594 INFO [train.py:886] (3/4) Epoch 19, batch 4200, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4953257.51 frames. ], batch size: 99, lr: 5.67e-03, grad_scale: 64.0 2023-12-22 13:08:57,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599920.0, ans=0.1 2023-12-22 13:09:48,160 INFO [train.py:886] (3/4) Epoch 19, batch 4250, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4953642.99 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:09:50,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.908e+01 3.025e+01 3.145e+01 3.602e+01, threshold=6.050e+01, percent-clipped=0.0 2023-12-22 13:09:51,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=600253.3333333334, ans=0.125 2023-12-22 13:09:53,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-12-22 13:09:54,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=600253.3333333334, ans=0.125 2023-12-22 13:09:54,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=600253.3333333334, ans=0.125 2023-12-22 13:10:02,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=600320.0, ans=0.125 2023-12-22 13:10:07,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=600320.0, ans=0.0 2023-12-22 13:10:17,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=600386.6666666666, ans=0.0 2023-12-22 13:10:19,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600453.3333333334, ans=0.1 2023-12-22 13:10:29,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=600520.0, ans=0.1 2023-12-22 13:10:39,405 INFO [train.py:886] (3/4) Epoch 19, batch 4300, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4950198.58 frames. ], batch size: 99, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:10:40,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-12-22 13:10:44,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-12-22 13:10:46,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=600586.6666666666, ans=0.125 2023-12-22 13:10:47,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=600586.6666666666, ans=0.0 2023-12-22 13:10:59,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=600720.0, ans=0.125 2023-12-22 13:11:12,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=600786.6666666666, ans=0.125 2023-12-22 13:11:16,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=600786.6666666666, ans=0.125 2023-12-22 13:11:20,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=600786.6666666666, ans=0.0 2023-12-22 13:11:32,038 INFO [train.py:886] (3/4) Epoch 19, batch 4350, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4955218.97 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:11:34,859 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.519e+01 2.892e+01 3.027e+01 3.169e+01 3.833e+01, threshold=6.053e+01, percent-clipped=0.0 2023-12-22 13:12:06,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=601120.0, ans=0.125 2023-12-22 13:12:23,986 INFO [train.py:886] (3/4) Epoch 19, batch 4400, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4953223.14 frames. ], batch size: 99, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:12:28,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=601253.3333333334, ans=0.125 2023-12-22 13:12:28,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=601253.3333333334, ans=0.125 2023-12-22 13:12:33,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=601320.0, ans=0.0 2023-12-22 13:12:34,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=601320.0, ans=0.0 2023-12-22 13:13:15,228 INFO [train.py:886] (3/4) Epoch 19, batch 4450, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4954257.04 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:13:18,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=601586.6666666666, ans=0.0 2023-12-22 13:13:19,442 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.870e+01 3.001e+01 3.186e+01 3.529e+01, threshold=6.001e+01, percent-clipped=0.0 2023-12-22 13:13:19,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601586.6666666666, ans=0.1 2023-12-22 13:13:23,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=601586.6666666666, ans=0.125 2023-12-22 13:13:24,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=601586.6666666666, ans=0.125 2023-12-22 13:13:33,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=601653.3333333334, ans=0.2 2023-12-22 13:13:34,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-22 13:13:39,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=601720.0, ans=0.125 2023-12-22 13:13:42,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601720.0, ans=0.125 2023-12-22 13:14:07,332 INFO [train.py:886] (3/4) Epoch 19, batch 4500, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4951072.30 frames. ], batch size: 100, lr: 5.66e-03, grad_scale: 64.0 2023-12-22 13:14:33,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=602053.3333333334, ans=0.125 2023-12-22 13:14:58,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=602253.3333333334, ans=0.2 2023-12-22 13:14:59,673 INFO [train.py:886] (3/4) Epoch 19, batch 4550, loss[loss=0.01436, audio_tagging_loss=0.01436, over 21378.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4950456.47 frames. ], batch size: 107, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:15:02,434 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.607e+01 2.855e+01 2.993e+01 3.155e+01 3.667e+01, threshold=5.986e+01, percent-clipped=0.0 2023-12-22 13:15:07,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=602253.3333333334, ans=0.2 2023-12-22 13:15:34,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=602453.3333333334, ans=0.09899494936611666 2023-12-22 13:15:49,760 INFO [train.py:886] (3/4) Epoch 19, batch 4600, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4956846.25 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:15:52,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602586.6666666666, ans=0.1 2023-12-22 13:15:59,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602653.3333333334, ans=0.1 2023-12-22 13:16:24,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=602786.6666666666, ans=0.125 2023-12-22 13:16:28,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602786.6666666666, ans=0.1 2023-12-22 13:16:38,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=602853.3333333334, ans=0.0 2023-12-22 13:16:39,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=602853.3333333334, ans=0.125 2023-12-22 13:16:41,037 INFO [train.py:886] (3/4) Epoch 19, batch 4650, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4958033.15 frames. ], batch size: 100, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:16:43,878 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 2.933e+01 3.063e+01 3.173e+01 3.851e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 13:16:47,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=602920.0, ans=15.0 2023-12-22 13:16:53,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=602986.6666666666, ans=0.125 2023-12-22 13:17:01,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=603053.3333333334, ans=0.125 2023-12-22 13:17:02,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=603053.3333333334, ans=0.0 2023-12-22 13:17:11,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=603120.0, ans=0.0 2023-12-22 13:17:17,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=603120.0, ans=0.1 2023-12-22 13:17:26,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=603186.6666666666, ans=0.0 2023-12-22 13:17:30,323 INFO [train.py:886] (3/4) Epoch 19, batch 4700, loss[loss=0.01475, audio_tagging_loss=0.01475, over 24750.00 frames. ], tot_loss[loss=0.01399, audio_tagging_loss=0.01399, over 4957999.94 frames. ], batch size: 99, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:17:32,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=603253.3333333334, ans=0.0 2023-12-22 13:17:58,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-12-22 13:17:59,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=603453.3333333334, ans=0.0 2023-12-22 13:18:11,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2023-12-22 13:18:13,784 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:18:18,152 INFO [train.py:886] (3/4) Epoch 19, batch 4750, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24750.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 4955411.32 frames. ], batch size: 99, lr: 5.65e-03, grad_scale: 64.0 2023-12-22 13:18:20,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 2.986e+01 3.070e+01 3.231e+01 3.664e+01, threshold=6.140e+01, percent-clipped=0.0 2023-12-22 13:18:51,858 INFO [train.py:886] (3/4) Epoch 20, batch 0, loss[loss=0.03243, audio_tagging_loss=0.03243, over 21299.00 frames. ], tot_loss[loss=0.03243, audio_tagging_loss=0.03243, over 21299.00 frames. ], batch size: 107, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:18:51,859 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 13:19:12,477 INFO [train.py:917] (3/4) Epoch 20, validation: loss=0.03315, audio_tagging_loss=0.03315, over 3737520.00 frames. 2023-12-22 13:19:12,478 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 13:19:13,607 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:19:20,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-12-22 13:19:22,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=603760.0, ans=0.0 2023-12-22 13:19:42,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-12-22 13:19:43,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-22 13:19:45,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=603893.3333333334, ans=0.125 2023-12-22 13:19:49,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=603893.3333333334, ans=0.125 2023-12-22 13:19:58,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=603960.0, ans=0.125 2023-12-22 13:20:04,374 INFO [train.py:886] (3/4) Epoch 20, batch 50, loss[loss=0.0167, audio_tagging_loss=0.0167, over 25000.00 frames. ], tot_loss[loss=0.02174, audio_tagging_loss=0.02174, over 1122494.46 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:20:08,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2023-12-22 13:20:33,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=604226.6666666666, ans=0.0 2023-12-22 13:20:42,979 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.377e+01 3.713e+01 4.293e+01 9.552e+01, threshold=7.426e+01, percent-clipped=7.0 2023-12-22 13:20:48,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2023-12-22 13:20:54,299 INFO [train.py:886] (3/4) Epoch 20, batch 100, loss[loss=0.01745, audio_tagging_loss=0.01745, over 25000.00 frames. ], tot_loss[loss=0.01882, audio_tagging_loss=0.01882, over 1980933.41 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:21:03,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.47 vs. limit=22.5 2023-12-22 13:21:05,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-12-22 13:21:13,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.30 vs. limit=22.5 2023-12-22 13:21:23,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.15 vs. limit=22.5 2023-12-22 13:21:33,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=604560.0, ans=0.0 2023-12-22 13:21:37,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604626.6666666666, ans=0.1 2023-12-22 13:21:37,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604626.6666666666, ans=0.125 2023-12-22 13:21:46,236 INFO [train.py:886] (3/4) Epoch 20, batch 150, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01712, audio_tagging_loss=0.01712, over 2644236.09 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:21:49,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=604693.3333333334, ans=0.1 2023-12-22 13:21:54,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=604693.3333333334, ans=0.125 2023-12-22 13:21:54,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=604693.3333333334, ans=0.04949747468305833 2023-12-22 13:22:24,479 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.963e+01 3.102e+01 3.243e+01 3.699e+01, threshold=6.204e+01, percent-clipped=0.0 2023-12-22 13:22:35,853 INFO [train.py:886] (3/4) Epoch 20, batch 200, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01603, audio_tagging_loss=0.01603, over 3162632.94 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:22:41,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-12-22 13:23:15,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=15.0 2023-12-22 13:23:18,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:18,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:22,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=605293.3333333334, ans=0.125 2023-12-22 13:23:27,217 INFO [train.py:886] (3/4) Epoch 20, batch 250, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01546, audio_tagging_loss=0.01546, over 3561726.18 frames. ], batch size: 100, lr: 5.50e-03, grad_scale: 32.0 2023-12-22 13:23:27,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=605360.0, ans=10.0 2023-12-22 13:24:05,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=605560.0, ans=0.125 2023-12-22 13:24:05,955 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 2.907e+01 3.036e+01 3.184e+01 3.948e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 13:24:08,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2023-12-22 13:24:09,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=605626.6666666666, ans=0.0 2023-12-22 13:24:11,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605626.6666666666, ans=0.125 2023-12-22 13:24:12,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=605626.6666666666, ans=0.5 2023-12-22 13:24:17,995 INFO [train.py:886] (3/4) Epoch 20, batch 300, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 3868828.34 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:24:22,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.12 vs. limit=22.5 2023-12-22 13:24:43,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2023-12-22 13:24:49,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=605893.3333333334, ans=0.125 2023-12-22 13:24:52,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.23 vs. limit=22.5 2023-12-22 13:24:55,404 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:25:08,343 INFO [train.py:886] (3/4) Epoch 20, batch 350, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01481, audio_tagging_loss=0.01481, over 4105372.43 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:25:29,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=606160.0, ans=0.125 2023-12-22 13:25:32,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=606160.0, ans=0.05 2023-12-22 13:25:47,579 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+01 2.893e+01 2.999e+01 3.154e+01 3.798e+01, threshold=5.997e+01, percent-clipped=0.0 2023-12-22 13:25:50,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=606293.3333333334, ans=0.025 2023-12-22 13:25:56,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=606293.3333333334, ans=0.2 2023-12-22 13:25:59,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=606360.0, ans=0.0 2023-12-22 13:26:00,539 INFO [train.py:886] (3/4) Epoch 20, batch 400, loss[loss=0.01663, audio_tagging_loss=0.01663, over 25000.00 frames. ], tot_loss[loss=0.01453, audio_tagging_loss=0.01453, over 4288568.03 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:26:07,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=606360.0, ans=0.025 2023-12-22 13:26:12,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-12-22 13:26:15,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=606426.6666666666, ans=0.125 2023-12-22 13:26:35,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=606560.0, ans=0.0 2023-12-22 13:26:35,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=606560.0, ans=0.125 2023-12-22 13:26:41,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=606626.6666666666, ans=0.0 2023-12-22 13:26:50,005 INFO [train.py:886] (3/4) Epoch 20, batch 450, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 4438303.84 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:27:15,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=606826.6666666666, ans=0.0 2023-12-22 13:27:18,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-22 13:27:18,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-12-22 13:27:20,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=606893.3333333334, ans=0.025 2023-12-22 13:27:29,401 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.863e+01 2.962e+01 3.098e+01 3.891e+01, threshold=5.924e+01, percent-clipped=0.0 2023-12-22 13:27:38,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=12.0 2023-12-22 13:27:41,506 INFO [train.py:886] (3/4) Epoch 20, batch 500, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 4559982.27 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:27:44,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=607026.6666666666, ans=0.125 2023-12-22 13:28:29,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-12-22 13:28:33,273 INFO [train.py:886] (3/4) Epoch 20, batch 550, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01395, audio_tagging_loss=0.01395, over 4648354.03 frames. ], batch size: 100, lr: 5.49e-03, grad_scale: 32.0 2023-12-22 13:28:39,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607360.0, ans=0.125 2023-12-22 13:28:53,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.74 vs. limit=22.5 2023-12-22 13:29:12,562 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+01 2.894e+01 3.022e+01 3.158e+01 3.590e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 13:29:21,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-12-22 13:29:23,971 INFO [train.py:886] (3/4) Epoch 20, batch 600, loss[loss=0.01441, audio_tagging_loss=0.01441, over 24943.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 4707367.94 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:29:39,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=607760.0, ans=0.0 2023-12-22 13:29:39,829 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:30:15,642 INFO [train.py:886] (3/4) Epoch 20, batch 650, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24063.00 frames. ], tot_loss[loss=0.01406, audio_tagging_loss=0.01406, over 4755944.06 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:30:31,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608093.3333333334, ans=0.1 2023-12-22 13:30:38,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=608160.0, ans=0.0 2023-12-22 13:30:54,377 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.917e+01 3.051e+01 3.204e+01 3.727e+01, threshold=6.102e+01, percent-clipped=0.0 2023-12-22 13:30:56,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.11 vs. limit=22.5 2023-12-22 13:31:06,494 INFO [train.py:886] (3/4) Epoch 20, batch 700, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4793266.92 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:31:13,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=608360.0, ans=0.125 2023-12-22 13:31:16,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=608426.6666666666, ans=0.035 2023-12-22 13:31:19,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-22 13:31:20,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=608426.6666666666, ans=0.125 2023-12-22 13:31:22,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=608426.6666666666, ans=0.0 2023-12-22 13:31:24,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=608426.6666666666, ans=0.0 2023-12-22 13:31:33,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-12-22 13:31:46,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=608626.6666666666, ans=0.2 2023-12-22 13:31:52,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-12-22 13:31:57,676 INFO [train.py:886] (3/4) Epoch 20, batch 750, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4824574.91 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:31:58,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608693.3333333334, ans=0.1 2023-12-22 13:31:58,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=608693.3333333334, ans=0.0 2023-12-22 13:32:00,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=12.0 2023-12-22 13:32:16,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=608760.0, ans=0.125 2023-12-22 13:32:26,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2023-12-22 13:32:35,948 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.906e+01 3.017e+01 3.126e+01 3.757e+01, threshold=6.033e+01, percent-clipped=0.0 2023-12-22 13:32:38,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608960.0, ans=0.125 2023-12-22 13:32:41,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=608960.0, ans=0.0 2023-12-22 13:32:49,573 INFO [train.py:886] (3/4) Epoch 20, batch 800, loss[loss=0.0176, audio_tagging_loss=0.0176, over 25000.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4857947.76 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:32:50,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=609026.6666666666, ans=0.125 2023-12-22 13:32:55,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=609026.6666666666, ans=0.125 2023-12-22 13:33:02,924 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:33:06,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=609093.3333333334, ans=0.2 2023-12-22 13:33:26,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-22 13:33:31,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=609293.3333333334, ans=0.0 2023-12-22 13:33:40,119 INFO [train.py:886] (3/4) Epoch 20, batch 850, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4879483.10 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:33:45,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2023-12-22 13:34:10,328 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:34:19,519 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.642e+01 2.973e+01 3.095e+01 3.258e+01 3.569e+01, threshold=6.190e+01, percent-clipped=0.0 2023-12-22 13:34:32,281 INFO [train.py:886] (3/4) Epoch 20, batch 900, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01381, audio_tagging_loss=0.01381, over 4896360.05 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2023-12-22 13:34:33,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=609693.3333333334, ans=0.125 2023-12-22 13:34:41,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=609760.0, ans=0.125 2023-12-22 13:34:49,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-12-22 13:35:04,573 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:35:10,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=12.0 2023-12-22 13:35:24,642 INFO [train.py:886] (3/4) Epoch 20, batch 950, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.0139, audio_tagging_loss=0.0139, over 4900376.27 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:35:32,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610026.6666666666, ans=0.1 2023-12-22 13:36:03,994 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.893e+01 3.045e+01 3.239e+01 3.538e+01, threshold=6.090e+01, percent-clipped=0.0 2023-12-22 13:36:09,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610293.3333333334, ans=0.1 2023-12-22 13:36:16,083 INFO [train.py:886] (3/4) Epoch 20, batch 1000, loss[loss=0.0144, audio_tagging_loss=0.0144, over 22458.00 frames. ], tot_loss[loss=0.01393, audio_tagging_loss=0.01393, over 4902777.84 frames. ], batch size: 107, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:36:24,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.06 vs. limit=22.5 2023-12-22 13:36:24,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=610360.0, ans=0.125 2023-12-22 13:36:32,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-12-22 13:36:33,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=610426.6666666666, ans=0.2 2023-12-22 13:36:33,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610426.6666666666, ans=0.1 2023-12-22 13:36:38,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=610493.3333333334, ans=0.0 2023-12-22 13:36:47,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610560.0, ans=0.125 2023-12-22 13:36:52,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=610560.0, ans=0.04949747468305833 2023-12-22 13:36:57,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=610626.6666666666, ans=0.2 2023-12-22 13:36:57,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=610626.6666666666, ans=0.0 2023-12-22 13:37:08,806 INFO [train.py:886] (3/4) Epoch 20, batch 1050, loss[loss=0.01554, audio_tagging_loss=0.01554, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4915015.86 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:37:47,458 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.594e+01 2.869e+01 3.046e+01 3.181e+01 3.757e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 13:37:59,500 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:38:00,255 INFO [train.py:886] (3/4) Epoch 20, batch 1100, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4924762.70 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:38:02,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=611026.6666666666, ans=0.0 2023-12-22 13:38:28,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=611160.0, ans=0.125 2023-12-22 13:38:32,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-12-22 13:38:47,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=611293.3333333334, ans=0.07 2023-12-22 13:38:48,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.28 vs. limit=22.5 2023-12-22 13:38:50,979 INFO [train.py:886] (3/4) Epoch 20, batch 1150, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4933774.17 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:39:01,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=611426.6666666666, ans=0.05 2023-12-22 13:39:26,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-12-22 13:39:29,145 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+01 2.913e+01 3.020e+01 3.168e+01 3.585e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 13:39:34,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:39:38,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-12-22 13:39:42,695 INFO [train.py:886] (3/4) Epoch 20, batch 1200, loss[loss=0.01633, audio_tagging_loss=0.01633, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4938256.10 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:39:46,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=611693.3333333334, ans=0.125 2023-12-22 13:40:21,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611960.0, ans=0.125 2023-12-22 13:40:31,866 INFO [train.py:886] (3/4) Epoch 20, batch 1250, loss[loss=0.01369, audio_tagging_loss=0.01369, over 24750.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4936660.07 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2023-12-22 13:40:34,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=612026.6666666666, ans=0.125 2023-12-22 13:40:39,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=612026.6666666666, ans=0.04949747468305833 2023-12-22 13:40:40,476 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:41:05,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=612226.6666666666, ans=0.0 2023-12-22 13:41:10,086 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.713e+01 2.956e+01 3.075e+01 3.204e+01 3.822e+01, threshold=6.150e+01, percent-clipped=0.0 2023-12-22 13:41:13,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612293.3333333334, ans=0.1 2023-12-22 13:41:13,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.82 vs. limit=22.5 2023-12-22 13:41:22,080 INFO [train.py:886] (3/4) Epoch 20, batch 1300, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4940784.87 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:41:38,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=612426.6666666666, ans=0.0 2023-12-22 13:41:46,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-12-22 13:41:51,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=612493.3333333334, ans=0.0 2023-12-22 13:41:57,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-12-22 13:42:13,394 INFO [train.py:886] (3/4) Epoch 20, batch 1350, loss[loss=0.01549, audio_tagging_loss=0.01549, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4939613.75 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:42:18,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-22 13:42:25,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2023-12-22 13:42:26,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-12-22 13:42:44,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612893.3333333334, ans=0.125 2023-12-22 13:42:50,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612893.3333333334, ans=0.1 2023-12-22 13:42:51,500 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.862e+01 2.982e+01 3.143e+01 3.555e+01, threshold=5.964e+01, percent-clipped=0.0 2023-12-22 13:43:02,828 INFO [train.py:886] (3/4) Epoch 20, batch 1400, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24028.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4946075.47 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:43:17,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-12-22 13:43:27,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=613160.0, ans=0.125 2023-12-22 13:43:27,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613160.0, ans=0.1 2023-12-22 13:43:29,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=613160.0, ans=0.0 2023-12-22 13:43:56,511 INFO [train.py:886] (3/4) Epoch 20, batch 1450, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24078.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4949687.00 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:44:13,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=613426.6666666666, ans=0.2 2023-12-22 13:44:14,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=613493.3333333334, ans=0.125 2023-12-22 13:44:17,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=613493.3333333334, ans=0.0 2023-12-22 13:44:20,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=613493.3333333334, ans=0.125 2023-12-22 13:44:34,560 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+01 2.829e+01 3.018e+01 3.145e+01 3.579e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 13:44:35,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613626.6666666666, ans=0.1 2023-12-22 13:44:36,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=613626.6666666666, ans=0.125 2023-12-22 13:44:45,939 INFO [train.py:886] (3/4) Epoch 20, batch 1500, loss[loss=0.01874, audio_tagging_loss=0.01874, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4948788.93 frames. ], batch size: 99, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:45:12,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=613826.6666666666, ans=0.2 2023-12-22 13:45:20,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.68 vs. limit=10.0 2023-12-22 13:45:31,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-22 13:45:33,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613960.0, ans=0.125 2023-12-22 13:45:34,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=613960.0, ans=0.0 2023-12-22 13:45:37,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=614026.6666666666, ans=0.0 2023-12-22 13:45:38,426 INFO [train.py:886] (3/4) Epoch 20, batch 1550, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24956.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 4947801.80 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:45:45,249 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:45:45,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=614026.6666666666, ans=0.125 2023-12-22 13:45:56,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=614093.3333333334, ans=0.0 2023-12-22 13:45:58,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2023-12-22 13:46:16,252 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+01 2.942e+01 3.051e+01 3.184e+01 5.064e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 13:46:24,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=614293.3333333334, ans=0.125 2023-12-22 13:46:29,799 INFO [train.py:886] (3/4) Epoch 20, batch 1600, loss[loss=0.01521, audio_tagging_loss=0.01521, over 25000.00 frames. ], tot_loss[loss=0.01389, audio_tagging_loss=0.01389, over 4938227.14 frames. ], batch size: 100, lr: 5.46e-03, grad_scale: 32.0 2023-12-22 13:46:41,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2023-12-22 13:46:49,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=614493.3333333334, ans=0.2 2023-12-22 13:46:49,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-12-22 13:47:10,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-22 13:47:13,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=614626.6666666666, ans=0.0 2023-12-22 13:47:13,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-12-22 13:47:16,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=614626.6666666666, ans=0.0 2023-12-22 13:47:20,716 INFO [train.py:886] (3/4) Epoch 20, batch 1650, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4942105.01 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:47:22,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=614693.3333333334, ans=0.125 2023-12-22 13:47:51,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614893.3333333334, ans=0.125 2023-12-22 13:47:59,614 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.897e+01 3.020e+01 3.160e+01 3.964e+01, threshold=6.040e+01, percent-clipped=0.0 2023-12-22 13:47:59,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=614893.3333333334, ans=0.2 2023-12-22 13:48:08,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=614960.0, ans=0.125 2023-12-22 13:48:12,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=615026.6666666666, ans=0.125 2023-12-22 13:48:13,735 INFO [train.py:886] (3/4) Epoch 20, batch 1700, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4945489.69 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:48:30,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=615093.3333333334, ans=0.125 2023-12-22 13:48:42,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=615226.6666666666, ans=0.0 2023-12-22 13:48:48,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=615226.6666666666, ans=0.2 2023-12-22 13:48:49,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=615226.6666666666, ans=0.05 2023-12-22 13:48:54,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-22 13:49:02,945 INFO [train.py:886] (3/4) Epoch 20, batch 1750, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4946537.43 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:49:08,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=615360.0, ans=0.125 2023-12-22 13:49:37,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=615560.0, ans=0.2 2023-12-22 13:49:42,938 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+01 2.888e+01 2.963e+01 3.094e+01 5.510e+01, threshold=5.927e+01, percent-clipped=0.0 2023-12-22 13:49:45,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=615626.6666666666, ans=0.125 2023-12-22 13:49:45,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=615626.6666666666, ans=0.125 2023-12-22 13:49:48,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=615626.6666666666, ans=0.0 2023-12-22 13:49:54,996 INFO [train.py:886] (3/4) Epoch 20, batch 1800, loss[loss=0.01158, audio_tagging_loss=0.01158, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4945579.54 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:50:01,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=615693.3333333334, ans=0.0 2023-12-22 13:50:12,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=615760.0, ans=0.125 2023-12-22 13:50:17,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615826.6666666666, ans=0.125 2023-12-22 13:50:25,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=615893.3333333334, ans=0.0 2023-12-22 13:50:25,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=615893.3333333334, ans=0.125 2023-12-22 13:50:28,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=615893.3333333334, ans=0.04949747468305833 2023-12-22 13:50:47,407 INFO [train.py:886] (3/4) Epoch 20, batch 1850, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4945895.38 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:51:09,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=616160.0, ans=0.0 2023-12-22 13:51:20,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=616226.6666666666, ans=0.125 2023-12-22 13:51:22,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=616226.6666666666, ans=0.125 2023-12-22 13:51:25,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+01 2.959e+01 3.073e+01 3.205e+01 3.995e+01, threshold=6.146e+01, percent-clipped=0.0 2023-12-22 13:51:38,097 INFO [train.py:886] (3/4) Epoch 20, batch 1900, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4942425.24 frames. ], batch size: 99, lr: 5.45e-03, grad_scale: 32.0 2023-12-22 13:51:38,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=616360.0, ans=0.0 2023-12-22 13:51:41,207 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 13:52:10,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=616560.0, ans=0.125 2023-12-22 13:52:14,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=616560.0, ans=0.0 2023-12-22 13:52:30,204 INFO [train.py:886] (3/4) Epoch 20, batch 1950, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4940410.53 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 32.0 2023-12-22 13:52:48,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=616760.0, ans=0.2 2023-12-22 13:53:02,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=616893.3333333334, ans=0.0 2023-12-22 13:53:09,468 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.571e+01 2.904e+01 3.050e+01 3.172e+01 4.406e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 13:53:12,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=616960.0, ans=0.0 2023-12-22 13:53:22,132 INFO [train.py:886] (3/4) Epoch 20, batch 2000, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4938548.93 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:53:24,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=617026.6666666666, ans=0.125 2023-12-22 13:53:26,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=617026.6666666666, ans=0.5 2023-12-22 13:53:31,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=617093.3333333334, ans=0.125 2023-12-22 13:53:37,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=617093.3333333334, ans=0.125 2023-12-22 13:54:05,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617293.3333333334, ans=0.1 2023-12-22 13:54:05,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.83 vs. limit=22.5 2023-12-22 13:54:14,213 INFO [train.py:886] (3/4) Epoch 20, batch 2050, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4943475.77 frames. ], batch size: 99, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:54:22,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=617426.6666666666, ans=0.125 2023-12-22 13:54:34,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.55 vs. limit=15.0 2023-12-22 13:54:41,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=617493.3333333334, ans=0.125 2023-12-22 13:54:53,074 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 2.903e+01 3.029e+01 3.164e+01 3.600e+01, threshold=6.057e+01, percent-clipped=0.0 2023-12-22 13:55:02,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=617626.6666666666, ans=0.2 2023-12-22 13:55:04,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617626.6666666666, ans=0.1 2023-12-22 13:55:06,640 INFO [train.py:886] (3/4) Epoch 20, batch 2100, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4949629.15 frames. ], batch size: 99, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:55:07,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=617693.3333333334, ans=0.125 2023-12-22 13:55:55,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=617960.0, ans=0.125 2023-12-22 13:55:57,530 INFO [train.py:886] (3/4) Epoch 20, batch 2150, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4950030.88 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:56:00,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2023-12-22 13:56:37,616 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+01 2.945e+01 3.054e+01 3.212e+01 3.650e+01, threshold=6.109e+01, percent-clipped=0.0 2023-12-22 13:56:47,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618293.3333333334, ans=0.125 2023-12-22 13:56:49,058 INFO [train.py:886] (3/4) Epoch 20, batch 2200, loss[loss=0.01416, audio_tagging_loss=0.01416, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4948640.42 frames. ], batch size: 99, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:24,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=618560.0, ans=0.125 2023-12-22 13:57:37,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-12-22 13:57:40,635 INFO [train.py:886] (3/4) Epoch 20, batch 2250, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4944467.20 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 64.0 2023-12-22 13:57:54,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-12-22 13:58:09,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=618893.3333333334, ans=0.0 2023-12-22 13:58:18,003 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.647e+01 2.946e+01 3.059e+01 3.211e+01 3.803e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 13:58:29,438 INFO [train.py:886] (3/4) Epoch 20, batch 2300, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4943748.92 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:58:30,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=619026.6666666666, ans=0.2 2023-12-22 13:58:36,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-12-22 13:58:48,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=619093.3333333334, ans=0.2 2023-12-22 13:58:50,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=619160.0, ans=0.0 2023-12-22 13:58:51,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-12-22 13:58:51,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=619160.0, ans=0.2 2023-12-22 13:58:55,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=619160.0, ans=0.125 2023-12-22 13:58:59,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=619226.6666666666, ans=0.125 2023-12-22 13:59:12,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=619293.3333333334, ans=0.04949747468305833 2023-12-22 13:59:19,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-12-22 13:59:21,861 INFO [train.py:886] (3/4) Epoch 20, batch 2350, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4948296.76 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 13:59:31,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2023-12-22 14:00:00,830 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.549e+01 2.868e+01 3.010e+01 3.128e+01 3.736e+01, threshold=6.020e+01, percent-clipped=0.0 2023-12-22 14:00:05,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.0 2023-12-22 14:00:11,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=619693.3333333334, ans=0.0 2023-12-22 14:00:12,311 INFO [train.py:886] (3/4) Epoch 20, batch 2400, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4945771.27 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:00:12,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=619693.3333333334, ans=0.0 2023-12-22 14:00:14,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=619693.3333333334, ans=0.2 2023-12-22 14:00:21,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=619693.3333333334, ans=0.0 2023-12-22 14:00:28,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=619760.0, ans=22.5 2023-12-22 14:00:32,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=619826.6666666666, ans=0.0 2023-12-22 14:00:59,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=619960.0, ans=0.2 2023-12-22 14:01:03,797 INFO [train.py:886] (3/4) Epoch 20, batch 2450, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4956372.80 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:01:08,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=620026.6666666666, ans=0.2 2023-12-22 14:01:10,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=620026.6666666666, ans=0.2 2023-12-22 14:01:22,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=620093.3333333334, ans=0.125 2023-12-22 14:01:41,189 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+01 2.937e+01 3.082e+01 3.201e+01 3.649e+01, threshold=6.165e+01, percent-clipped=0.0 2023-12-22 14:01:54,864 INFO [train.py:886] (3/4) Epoch 20, batch 2500, loss[loss=0.01365, audio_tagging_loss=0.01365, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4955197.39 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:00,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620360.0, ans=0.1 2023-12-22 14:02:01,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=620360.0, ans=0.125 2023-12-22 14:02:21,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=620493.3333333334, ans=0.0 2023-12-22 14:02:22,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-12-22 14:02:26,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=620560.0, ans=0.125 2023-12-22 14:02:29,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-12-22 14:02:40,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620626.6666666666, ans=0.0 2023-12-22 14:02:41,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:42,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=620626.6666666666, ans=0.125 2023-12-22 14:02:44,679 INFO [train.py:886] (3/4) Epoch 20, batch 2550, loss[loss=0.01439, audio_tagging_loss=0.01439, over 24750.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4953369.33 frames. ], batch size: 99, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:02:53,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=620693.3333333334, ans=0.0 2023-12-22 14:02:58,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620760.0, ans=0.1 2023-12-22 14:03:05,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=620826.6666666666, ans=0.2 2023-12-22 14:03:06,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=620826.6666666666, ans=0.125 2023-12-22 14:03:09,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=620826.6666666666, ans=0.125 2023-12-22 14:03:25,336 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 2.957e+01 3.094e+01 3.249e+01 3.777e+01, threshold=6.188e+01, percent-clipped=0.0 2023-12-22 14:03:37,554 INFO [train.py:886] (3/4) Epoch 20, batch 2600, loss[loss=0.01585, audio_tagging_loss=0.01585, over 24103.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4952130.56 frames. ], batch size: 100, lr: 5.43e-03, grad_scale: 64.0 2023-12-22 14:03:38,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=621026.6666666666, ans=0.0 2023-12-22 14:03:41,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=621026.6666666666, ans=0.1 2023-12-22 14:03:44,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-22 14:04:10,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=621226.6666666666, ans=0.125 2023-12-22 14:04:30,330 INFO [train.py:886] (3/4) Epoch 20, batch 2650, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4951656.68 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:04:37,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621360.0, ans=0.1 2023-12-22 14:04:43,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=621426.6666666666, ans=0.0 2023-12-22 14:04:46,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=621426.6666666666, ans=0.125 2023-12-22 14:04:48,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=621426.6666666666, ans=0.125 2023-12-22 14:04:51,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621493.3333333334, ans=0.0 2023-12-22 14:04:54,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=621493.3333333334, ans=0.125 2023-12-22 14:05:05,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621560.0, ans=0.1 2023-12-22 14:05:07,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=621560.0, ans=0.125 2023-12-22 14:05:09,977 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.585e+01 2.862e+01 3.001e+01 3.149e+01 4.006e+01, threshold=6.003e+01, percent-clipped=0.0 2023-12-22 14:05:12,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=621626.6666666666, ans=0.0 2023-12-22 14:05:12,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=621626.6666666666, ans=0.0 2023-12-22 14:05:16,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=621626.6666666666, ans=0.2 2023-12-22 14:05:21,433 INFO [train.py:886] (3/4) Epoch 20, batch 2700, loss[loss=0.0139, audio_tagging_loss=0.0139, over 25000.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4954004.31 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:05:47,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=621826.6666666666, ans=0.0 2023-12-22 14:05:49,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=621826.6666666666, ans=0.125 2023-12-22 14:05:51,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2023-12-22 14:06:12,938 INFO [train.py:886] (3/4) Epoch 20, batch 2750, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4948384.55 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:06:17,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=622026.6666666666, ans=0.125 2023-12-22 14:06:29,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=622093.3333333334, ans=0.125 2023-12-22 14:06:52,648 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.935e+01 3.098e+01 3.194e+01 3.617e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 14:06:53,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=622293.3333333334, ans=0.05 2023-12-22 14:06:57,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:06:57,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:07:01,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=622293.3333333334, ans=0.125 2023-12-22 14:07:01,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622293.3333333334, ans=0.1 2023-12-22 14:07:04,120 INFO [train.py:886] (3/4) Epoch 20, batch 2800, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4949319.48 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:07:28,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 14:07:37,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=622560.0, ans=0.0 2023-12-22 14:07:44,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2023-12-22 14:07:54,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=622626.6666666666, ans=0.125 2023-12-22 14:07:56,124 INFO [train.py:886] (3/4) Epoch 20, batch 2850, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4945917.34 frames. ], batch size: 99, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:07:59,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=622693.3333333334, ans=0.0 2023-12-22 14:07:59,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=622693.3333333334, ans=0.0 2023-12-22 14:08:00,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622693.3333333334, ans=0.1 2023-12-22 14:08:15,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622826.6666666666, ans=0.1 2023-12-22 14:08:15,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622826.6666666666, ans=0.1 2023-12-22 14:08:16,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=622826.6666666666, ans=0.0 2023-12-22 14:08:16,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=622826.6666666666, ans=0.95 2023-12-22 14:08:26,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=622893.3333333334, ans=0.125 2023-12-22 14:08:27,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=622893.3333333334, ans=0.125 2023-12-22 14:08:34,141 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+01 2.924e+01 3.080e+01 3.230e+01 3.669e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 14:08:46,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-12-22 14:08:46,403 INFO [train.py:886] (3/4) Epoch 20, batch 2900, loss[loss=0.01273, audio_tagging_loss=0.01273, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4942371.45 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:09:08,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=623160.0, ans=0.09899494936611666 2023-12-22 14:09:14,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=623160.0, ans=0.2 2023-12-22 14:09:30,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=623293.3333333334, ans=0.1 2023-12-22 14:09:33,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623293.3333333334, ans=0.125 2023-12-22 14:09:36,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-12-22 14:09:36,748 INFO [train.py:886] (3/4) Epoch 20, batch 2950, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24025.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4942936.19 frames. ], batch size: 100, lr: 5.42e-03, grad_scale: 64.0 2023-12-22 14:09:42,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=623360.0, ans=0.1 2023-12-22 14:09:43,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=623360.0, ans=0.0 2023-12-22 14:10:06,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-12-22 14:10:07,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=623560.0, ans=0.125 2023-12-22 14:10:10,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=623560.0, ans=0.125 2023-12-22 14:10:14,853 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.870e+01 3.046e+01 3.169e+01 3.607e+01, threshold=6.091e+01, percent-clipped=0.0 2023-12-22 14:10:17,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623626.6666666666, ans=0.125 2023-12-22 14:10:18,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=623626.6666666666, ans=0.125 2023-12-22 14:10:27,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.51 vs. limit=22.5 2023-12-22 14:10:28,841 INFO [train.py:886] (3/4) Epoch 20, batch 3000, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4946502.71 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:10:28,841 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 14:10:50,335 INFO [train.py:917] (3/4) Epoch 20, validation: loss=0.03313, audio_tagging_loss=0.03313, over 3737520.00 frames. 2023-12-22 14:10:50,335 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 14:10:53,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-12-22 14:10:54,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623693.3333333334, ans=0.1 2023-12-22 14:11:03,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=623760.0, ans=0.125 2023-12-22 14:11:06,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=623760.0, ans=0.0 2023-12-22 14:11:06,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-12-22 14:11:09,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=623826.6666666666, ans=0.125 2023-12-22 14:11:13,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-12-22 14:11:28,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=623893.3333333334, ans=0.125 2023-12-22 14:11:28,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-12-22 14:11:36,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=623960.0, ans=0.2 2023-12-22 14:11:37,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=623960.0, ans=0.04949747468305833 2023-12-22 14:11:40,488 INFO [train.py:886] (3/4) Epoch 20, batch 3050, loss[loss=0.01258, audio_tagging_loss=0.01258, over 22834.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4947538.48 frames. ], batch size: 107, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:12:18,571 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+01 2.950e+01 3.022e+01 3.122e+01 3.698e+01, threshold=6.045e+01, percent-clipped=0.0 2023-12-22 14:12:30,796 INFO [train.py:886] (3/4) Epoch 20, batch 3100, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4956445.08 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:12:31,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=624360.0, ans=0.0 2023-12-22 14:12:59,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=12.0 2023-12-22 14:13:07,050 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:13:07,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-12-22 14:13:19,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624693.3333333334, ans=0.1 2023-12-22 14:13:20,710 INFO [train.py:886] (3/4) Epoch 20, batch 3150, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01371, audio_tagging_loss=0.01371, over 4948485.99 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:13:56,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=624893.3333333334, ans=0.125 2023-12-22 14:14:00,105 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.956e+01 3.114e+01 3.316e+01 3.696e+01, threshold=6.228e+01, percent-clipped=0.0 2023-12-22 14:14:03,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=624960.0, ans=0.95 2023-12-22 14:14:11,505 INFO [train.py:886] (3/4) Epoch 20, batch 3200, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4942176.34 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:14:11,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=625026.6666666666, ans=0.0 2023-12-22 14:14:21,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=625093.3333333334, ans=0.0 2023-12-22 14:14:23,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=625093.3333333334, ans=0.2 2023-12-22 14:14:25,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=625093.3333333334, ans=0.07 2023-12-22 14:14:26,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=625093.3333333334, ans=0.125 2023-12-22 14:14:31,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2023-12-22 14:14:49,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=625226.6666666666, ans=0.0 2023-12-22 14:14:50,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-12-22 14:14:52,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=625293.3333333334, ans=0.0 2023-12-22 14:15:03,416 INFO [train.py:886] (3/4) Epoch 20, batch 3250, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4945311.36 frames. ], batch size: 99, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:04,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=625360.0, ans=0.125 2023-12-22 14:15:10,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625360.0, ans=0.1 2023-12-22 14:15:16,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=625426.6666666666, ans=0.125 2023-12-22 14:15:24,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=625493.3333333334, ans=0.0 2023-12-22 14:15:40,985 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 2.857e+01 2.981e+01 3.142e+01 3.421e+01, threshold=5.962e+01, percent-clipped=0.0 2023-12-22 14:15:47,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=625626.6666666666, ans=0.125 2023-12-22 14:15:48,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-12-22 14:15:51,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-12-22 14:15:52,998 INFO [train.py:886] (3/4) Epoch 20, batch 3300, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4943198.14 frames. ], batch size: 100, lr: 5.41e-03, grad_scale: 64.0 2023-12-22 14:15:57,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=625693.3333333334, ans=0.0 2023-12-22 14:16:04,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625760.0, ans=0.1 2023-12-22 14:16:19,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2023-12-22 14:16:23,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=625893.3333333334, ans=0.0 2023-12-22 14:16:30,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625893.3333333334, ans=0.1 2023-12-22 14:16:42,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=625960.0, ans=0.0 2023-12-22 14:16:43,893 INFO [train.py:886] (3/4) Epoch 20, batch 3350, loss[loss=0.01273, audio_tagging_loss=0.01273, over 22275.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4942910.30 frames. ], batch size: 107, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:17:10,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=626160.0, ans=0.2 2023-12-22 14:17:21,207 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.948e+01 3.074e+01 3.157e+01 3.734e+01, threshold=6.147e+01, percent-clipped=0.0 2023-12-22 14:17:21,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626226.6666666666, ans=0.1 2023-12-22 14:17:22,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=12.0 2023-12-22 14:17:25,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-12-22 14:17:29,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=626293.3333333334, ans=0.0 2023-12-22 14:17:33,303 INFO [train.py:886] (3/4) Epoch 20, batch 3400, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4947207.07 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:17:42,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2023-12-22 14:17:54,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=626493.3333333334, ans=0.5 2023-12-22 14:17:56,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626493.3333333334, ans=0.1 2023-12-22 14:18:00,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=626493.3333333334, ans=0.125 2023-12-22 14:18:06,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=626560.0, ans=0.125 2023-12-22 14:18:17,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=626626.6666666666, ans=0.0 2023-12-22 14:18:25,375 INFO [train.py:886] (3/4) Epoch 20, batch 3450, loss[loss=0.01424, audio_tagging_loss=0.01424, over 23976.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4942993.45 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:18:26,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=626693.3333333334, ans=0.0 2023-12-22 14:18:35,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626760.0, ans=0.1 2023-12-22 14:18:49,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=626826.6666666666, ans=0.125 2023-12-22 14:19:03,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=626893.3333333334, ans=0.0 2023-12-22 14:19:04,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+01 2.964e+01 3.104e+01 3.232e+01 3.716e+01, threshold=6.208e+01, percent-clipped=0.0 2023-12-22 14:19:13,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=626960.0, ans=0.0 2023-12-22 14:19:17,578 INFO [train.py:886] (3/4) Epoch 20, batch 3500, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 4932153.18 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:19:18,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=627026.6666666666, ans=0.1 2023-12-22 14:19:20,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-12-22 14:19:28,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=627093.3333333334, ans=0.2 2023-12-22 14:19:36,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=627160.0, ans=0.09899494936611666 2023-12-22 14:19:45,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627160.0, ans=0.1 2023-12-22 14:19:49,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627226.6666666666, ans=0.125 2023-12-22 14:19:56,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=627226.6666666666, ans=0.125 2023-12-22 14:19:56,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=627226.6666666666, ans=0.2 2023-12-22 14:20:08,219 INFO [train.py:886] (3/4) Epoch 20, batch 3550, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4937282.07 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:20:08,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=627360.0, ans=0.125 2023-12-22 14:20:27,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=627426.6666666666, ans=0.125 2023-12-22 14:20:31,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=627493.3333333334, ans=0.0 2023-12-22 14:20:37,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=627493.3333333334, ans=0.125 2023-12-22 14:20:41,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627560.0, ans=0.1 2023-12-22 14:20:42,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2023-12-22 14:20:50,036 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+01 2.881e+01 3.030e+01 3.174e+01 3.581e+01, threshold=6.061e+01, percent-clipped=0.0 2023-12-22 14:20:50,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-12-22 14:20:56,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2023-12-22 14:21:01,410 INFO [train.py:886] (3/4) Epoch 20, batch 3600, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4941910.72 frames. ], batch size: 99, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:21:11,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=627760.0, ans=0.125 2023-12-22 14:21:42,016 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.523e-03 2023-12-22 14:21:53,547 INFO [train.py:886] (3/4) Epoch 20, batch 3650, loss[loss=0.0164, audio_tagging_loss=0.0164, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4952332.21 frames. ], batch size: 100, lr: 5.40e-03, grad_scale: 64.0 2023-12-22 14:21:56,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=628026.6666666666, ans=0.125 2023-12-22 14:22:04,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.55 vs. limit=22.5 2023-12-22 14:22:05,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=628093.3333333334, ans=0.2 2023-12-22 14:22:18,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628160.0, ans=0.1 2023-12-22 14:22:30,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628226.6666666666, ans=0.1 2023-12-22 14:22:32,324 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+01 2.914e+01 3.056e+01 3.191e+01 3.710e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 14:22:38,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-12-22 14:22:43,673 INFO [train.py:886] (3/4) Epoch 20, batch 3700, loss[loss=0.01553, audio_tagging_loss=0.01553, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4956784.38 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:22:44,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628360.0, ans=0.125 2023-12-22 14:23:04,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=15.0 2023-12-22 14:23:05,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=628493.3333333334, ans=0.2 2023-12-22 14:23:06,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=628493.3333333334, ans=0.125 2023-12-22 14:23:20,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=628560.0, ans=0.0 2023-12-22 14:23:22,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=628626.6666666666, ans=0.0 2023-12-22 14:23:28,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=628626.6666666666, ans=0.125 2023-12-22 14:23:30,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=628626.6666666666, ans=0.0 2023-12-22 14:23:35,230 INFO [train.py:886] (3/4) Epoch 20, batch 3750, loss[loss=0.01197, audio_tagging_loss=0.01197, over 23959.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4951526.28 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:24:14,148 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+01 2.921e+01 3.146e+01 3.299e+01 3.816e+01, threshold=6.291e+01, percent-clipped=0.0 2023-12-22 14:24:17,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=628960.0, ans=0.2 2023-12-22 14:24:25,603 INFO [train.py:886] (3/4) Epoch 20, batch 3800, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4946765.24 frames. ], batch size: 99, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:24:48,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=629160.0, ans=0.125 2023-12-22 14:24:52,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629160.0, ans=0.0 2023-12-22 14:25:07,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-22 14:25:18,298 INFO [train.py:886] (3/4) Epoch 20, batch 3850, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24066.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4945282.66 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:25:22,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=629360.0, ans=0.125 2023-12-22 14:25:23,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=629360.0, ans=0.125 2023-12-22 14:25:45,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=629493.3333333334, ans=0.125 2023-12-22 14:25:47,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=629493.3333333334, ans=0.0 2023-12-22 14:25:49,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=629560.0, ans=0.0 2023-12-22 14:25:49,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=629560.0, ans=0.125 2023-12-22 14:25:57,055 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.955e+01 3.052e+01 3.216e+01 3.711e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:26:09,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629626.6666666666, ans=0.1 2023-12-22 14:26:10,656 INFO [train.py:886] (3/4) Epoch 20, batch 3900, loss[loss=0.01386, audio_tagging_loss=0.01386, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4944209.63 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:26:26,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=629760.0, ans=0.125 2023-12-22 14:26:40,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-22 14:27:01,636 INFO [train.py:886] (3/4) Epoch 20, batch 3950, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4947881.86 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 64.0 2023-12-22 14:27:15,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-22 14:27:23,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=630160.0, ans=0.0 2023-12-22 14:27:40,561 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.540e+01 2.892e+01 3.043e+01 3.159e+01 3.802e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 14:27:51,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.73 vs. limit=22.5 2023-12-22 14:27:52,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=630360.0, ans=0.125 2023-12-22 14:27:53,425 INFO [train.py:886] (3/4) Epoch 20, batch 4000, loss[loss=0.01416, audio_tagging_loss=0.01416, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4946871.31 frames. ], batch size: 100, lr: 5.39e-03, grad_scale: 128.0 2023-12-22 14:27:53,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=630360.0, ans=0.125 2023-12-22 14:28:41,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=630626.6666666666, ans=0.125 2023-12-22 14:28:44,066 INFO [train.py:886] (3/4) Epoch 20, batch 4050, loss[loss=0.01369, audio_tagging_loss=0.01369, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4944719.93 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 128.0 2023-12-22 14:29:02,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=630760.0, ans=0.0 2023-12-22 14:29:04,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630826.6666666666, ans=0.1 2023-12-22 14:29:06,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-12-22 14:29:12,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630826.6666666666, ans=0.125 2023-12-22 14:29:24,772 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.997e+01 3.121e+01 3.224e+01 3.703e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 14:29:25,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630893.3333333334, ans=0.125 2023-12-22 14:29:36,669 INFO [train.py:886] (3/4) Epoch 20, batch 4100, loss[loss=0.01703, audio_tagging_loss=0.01703, over 24943.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4938051.63 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:29:42,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-12-22 14:29:44,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-22 14:29:58,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=631160.0, ans=0.0 2023-12-22 14:30:15,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=631293.3333333334, ans=0.125 2023-12-22 14:30:27,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2023-12-22 14:30:28,240 INFO [train.py:886] (3/4) Epoch 20, batch 4150, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24087.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4935903.19 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:30:28,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=631360.0, ans=0.0 2023-12-22 14:30:31,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=631360.0, ans=0.0 2023-12-22 14:30:38,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631426.6666666666, ans=0.1 2023-12-22 14:30:43,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=631426.6666666666, ans=0.0 2023-12-22 14:30:44,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=631426.6666666666, ans=0.2 2023-12-22 14:30:57,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-12-22 14:31:07,720 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.483e+01 2.915e+01 3.076e+01 3.208e+01 3.689e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 14:31:18,914 INFO [train.py:886] (3/4) Epoch 20, batch 4200, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4944900.18 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:31:29,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=631760.0, ans=0.0 2023-12-22 14:31:52,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=631893.3333333334, ans=0.2 2023-12-22 14:32:02,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=631960.0, ans=0.0 2023-12-22 14:32:11,395 INFO [train.py:886] (3/4) Epoch 20, batch 4250, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4947609.11 frames. ], batch size: 99, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:32:22,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=632093.3333333334, ans=0.2 2023-12-22 14:32:24,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=632093.3333333334, ans=0.0 2023-12-22 14:32:27,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=632093.3333333334, ans=0.125 2023-12-22 14:32:29,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=632093.3333333334, ans=0.125 2023-12-22 14:32:34,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632160.0, ans=0.1 2023-12-22 14:32:38,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-12-22 14:32:52,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.864e+01 3.006e+01 3.128e+01 3.399e+01, threshold=6.011e+01, percent-clipped=0.0 2023-12-22 14:32:57,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=632293.3333333334, ans=0.0 2023-12-22 14:33:04,786 INFO [train.py:886] (3/4) Epoch 20, batch 4300, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4956068.60 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:33:07,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=632360.0, ans=0.0 2023-12-22 14:33:10,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=632360.0, ans=0.0 2023-12-22 14:33:15,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=632426.6666666666, ans=0.0 2023-12-22 14:33:17,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=632426.6666666666, ans=0.0 2023-12-22 14:33:29,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=632493.3333333334, ans=0.125 2023-12-22 14:33:39,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=632560.0, ans=0.125 2023-12-22 14:33:41,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632560.0, ans=0.1 2023-12-22 14:33:54,941 INFO [train.py:886] (3/4) Epoch 20, batch 4350, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01369, audio_tagging_loss=0.01369, over 4955554.41 frames. ], batch size: 100, lr: 5.38e-03, grad_scale: 64.0 2023-12-22 14:33:55,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=632693.3333333334, ans=0.0 2023-12-22 14:33:59,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=632693.3333333334, ans=0.0 2023-12-22 14:34:08,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=632760.0, ans=0.2 2023-12-22 14:34:19,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632826.6666666666, ans=0.1 2023-12-22 14:34:21,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632826.6666666666, ans=0.125 2023-12-22 14:34:31,819 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:34:35,355 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.921e+01 3.074e+01 3.206e+01 3.879e+01, threshold=6.148e+01, percent-clipped=0.0 2023-12-22 14:34:47,216 INFO [train.py:886] (3/4) Epoch 20, batch 4400, loss[loss=0.01577, audio_tagging_loss=0.01577, over 24750.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 4953309.22 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:34:49,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=633026.6666666666, ans=0.2 2023-12-22 14:34:50,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-12-22 14:35:24,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=633226.6666666666, ans=0.125 2023-12-22 14:35:25,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=633226.6666666666, ans=0.125 2023-12-22 14:35:31,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=633293.3333333334, ans=0.2 2023-12-22 14:35:33,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=633293.3333333334, ans=0.2 2023-12-22 14:35:38,543 INFO [train.py:886] (3/4) Epoch 20, batch 4450, loss[loss=0.01362, audio_tagging_loss=0.01362, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4950851.88 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:35:51,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=633426.6666666666, ans=0.09899494936611666 2023-12-22 14:36:20,616 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.929e+01 3.037e+01 3.190e+01 3.598e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 14:36:31,012 INFO [train.py:886] (3/4) Epoch 20, batch 4500, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4954033.41 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:36:36,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=633693.3333333334, ans=0.0 2023-12-22 14:36:36,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633693.3333333334, ans=0.125 2023-12-22 14:36:47,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=633760.0, ans=0.125 2023-12-22 14:36:49,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.05 vs. limit=22.5 2023-12-22 14:37:03,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-12-22 14:37:07,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633893.3333333334, ans=0.1 2023-12-22 14:37:14,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=633960.0, ans=0.2 2023-12-22 14:37:17,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633960.0, ans=0.0 2023-12-22 14:37:24,385 INFO [train.py:886] (3/4) Epoch 20, batch 4550, loss[loss=0.01717, audio_tagging_loss=0.01717, over 22543.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4947531.46 frames. ], batch size: 107, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:37:27,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634026.6666666666, ans=0.125 2023-12-22 14:37:31,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=634026.6666666666, ans=0.2 2023-12-22 14:37:34,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=634093.3333333334, ans=0.0 2023-12-22 14:37:42,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=634160.0, ans=0.0 2023-12-22 14:37:44,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=634160.0, ans=0.1 2023-12-22 14:37:56,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=634226.6666666666, ans=0.02 2023-12-22 14:37:57,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-12-22 14:38:02,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=634226.6666666666, ans=0.125 2023-12-22 14:38:04,250 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.486e+01 2.875e+01 3.021e+01 3.198e+01 3.721e+01, threshold=6.043e+01, percent-clipped=0.0 2023-12-22 14:38:14,815 INFO [train.py:886] (3/4) Epoch 20, batch 4600, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4955702.15 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:38:22,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-12-22 14:38:23,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=12.0 2023-12-22 14:38:31,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-22 14:38:36,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.35 vs. limit=22.5 2023-12-22 14:38:36,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=634493.3333333334, ans=0.0 2023-12-22 14:39:08,032 INFO [train.py:886] (3/4) Epoch 20, batch 4650, loss[loss=0.01518, audio_tagging_loss=0.01518, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4956925.37 frames. ], batch size: 100, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:39:14,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=634693.3333333334, ans=0.125 2023-12-22 14:39:15,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=634693.3333333334, ans=0.0 2023-12-22 14:39:16,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.23 vs. limit=22.5 2023-12-22 14:39:18,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=634760.0, ans=0.125 2023-12-22 14:39:21,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-12-22 14:39:26,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=634760.0, ans=0.125 2023-12-22 14:39:36,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:39:37,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=634826.6666666666, ans=0.125 2023-12-22 14:39:38,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=634893.3333333334, ans=0.025 2023-12-22 14:39:41,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634893.3333333334, ans=0.125 2023-12-22 14:39:45,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=634893.3333333334, ans=0.125 2023-12-22 14:39:47,495 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.597e+01 2.896e+01 3.053e+01 3.210e+01 3.598e+01, threshold=6.106e+01, percent-clipped=0.0 2023-12-22 14:39:48,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=634960.0, ans=0.0 2023-12-22 14:39:50,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=634960.0, ans=0.125 2023-12-22 14:39:53,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=634960.0, ans=0.2 2023-12-22 14:39:56,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=635026.6666666666, ans=0.125 2023-12-22 14:39:57,581 INFO [train.py:886] (3/4) Epoch 20, batch 4700, loss[loss=0.01603, audio_tagging_loss=0.01603, over 24750.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4955608.85 frames. ], batch size: 99, lr: 5.37e-03, grad_scale: 64.0 2023-12-22 14:40:03,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=635026.6666666666, ans=0.125 2023-12-22 14:40:09,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=635093.3333333334, ans=0.125 2023-12-22 14:40:39,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=635293.3333333334, ans=0.1 2023-12-22 14:40:45,760 INFO [train.py:886] (3/4) Epoch 20, batch 4750, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4950789.27 frames. ], batch size: 99, lr: 5.36e-03, grad_scale: 64.0 2023-12-22 14:40:46,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=635360.0, ans=0.125 2023-12-22 14:40:47,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 14:41:19,716 INFO [train.py:886] (3/4) Epoch 21, batch 0, loss[loss=0.02853, audio_tagging_loss=0.02853, over 24049.00 frames. ], tot_loss[loss=0.02853, audio_tagging_loss=0.02853, over 24049.00 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:41:19,716 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 14:41:40,669 INFO [train.py:917] (3/4) Epoch 21, validation: loss=0.03243, audio_tagging_loss=0.03243, over 3737520.00 frames. 2023-12-22 14:41:40,670 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 14:41:51,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=635533.3333333334, ans=0.035 2023-12-22 14:42:04,965 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.952e+01 3.172e+01 3.843e+01 8.854e+01, threshold=6.343e+01, percent-clipped=8.0 2023-12-22 14:42:11,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=635666.6666666666, ans=0.125 2023-12-22 14:42:15,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=635666.6666666666, ans=0.125 2023-12-22 14:42:25,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2023-12-22 14:42:31,011 INFO [train.py:886] (3/4) Epoch 21, batch 50, loss[loss=0.01944, audio_tagging_loss=0.01944, over 24869.00 frames. ], tot_loss[loss=0.02171, audio_tagging_loss=0.02171, over 1123207.00 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:42:33,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=635800.0, ans=0.125 2023-12-22 14:42:35,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=635800.0, ans=0.0 2023-12-22 14:42:39,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=635800.0, ans=0.0 2023-12-22 14:43:11,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 14:43:15,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=636066.6666666666, ans=0.2 2023-12-22 14:43:17,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-12-22 14:43:21,769 INFO [train.py:886] (3/4) Epoch 21, batch 100, loss[loss=0.01541, audio_tagging_loss=0.01541, over 25000.00 frames. ], tot_loss[loss=0.01864, audio_tagging_loss=0.01864, over 1977096.18 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:43:25,864 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:43:36,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-22 14:43:46,537 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.292e+01 3.512e+01 3.782e+01 4.878e+01, threshold=7.024e+01, percent-clipped=0.0 2023-12-22 14:43:50,493 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:43:58,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:43:59,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:44:00,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=636333.3333333334, ans=0.125 2023-12-22 14:44:04,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=636400.0, ans=0.0 2023-12-22 14:44:13,137 INFO [train.py:886] (3/4) Epoch 21, batch 150, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01709, audio_tagging_loss=0.01709, over 2634913.58 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:44:13,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=636466.6666666666, ans=0.125 2023-12-22 14:44:18,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=636466.6666666666, ans=0.0 2023-12-22 14:44:35,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=636600.0, ans=0.0 2023-12-22 14:44:35,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=636600.0, ans=0.0 2023-12-22 14:44:55,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=636733.3333333334, ans=0.125 2023-12-22 14:44:56,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=636733.3333333334, ans=0.0 2023-12-22 14:44:58,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=636733.3333333334, ans=0.125 2023-12-22 14:44:59,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-12-22 14:45:03,388 INFO [train.py:886] (3/4) Epoch 21, batch 200, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01598, audio_tagging_loss=0.01598, over 3149021.40 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:45:06,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-12-22 14:45:26,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=636933.3333333334, ans=0.07 2023-12-22 14:45:28,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 2.981e+01 3.099e+01 3.247e+01 3.721e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 14:45:30,145 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:45:30,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=636933.3333333334, ans=0.025 2023-12-22 14:45:35,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=637000.0, ans=0.125 2023-12-22 14:45:56,395 INFO [train.py:886] (3/4) Epoch 21, batch 250, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 3547309.41 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 32.0 2023-12-22 14:45:56,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=637133.3333333334, ans=0.1 2023-12-22 14:45:57,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=637133.3333333334, ans=0.2 2023-12-22 14:46:05,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=637200.0, ans=0.125 2023-12-22 14:46:06,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=637200.0, ans=0.125 2023-12-22 14:46:06,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-12-22 14:46:11,809 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:46:35,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-12-22 14:46:48,410 INFO [train.py:886] (3/4) Epoch 21, batch 300, loss[loss=0.01199, audio_tagging_loss=0.01199, over 24750.00 frames. ], tot_loss[loss=0.01505, audio_tagging_loss=0.01505, over 3860011.15 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:46:48,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 14:46:49,556 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:47:12,439 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.934e+01 3.059e+01 3.180e+01 3.932e+01, threshold=6.118e+01, percent-clipped=0.0 2023-12-22 14:47:12,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=637600.0, ans=0.125 2023-12-22 14:47:17,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=637600.0, ans=15.0 2023-12-22 14:47:28,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=637666.6666666666, ans=0.125 2023-12-22 14:47:39,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=637800.0, ans=0.2 2023-12-22 14:47:39,919 INFO [train.py:886] (3/4) Epoch 21, batch 350, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 4096756.06 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:47:41,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=637800.0, ans=0.0 2023-12-22 14:47:45,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=637800.0, ans=0.125 2023-12-22 14:47:47,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=637800.0, ans=0.125 2023-12-22 14:47:49,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=637866.6666666666, ans=0.125 2023-12-22 14:48:07,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-12-22 14:48:16,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-22 14:48:21,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=638066.6666666666, ans=15.0 2023-12-22 14:48:30,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=638066.6666666666, ans=0.125 2023-12-22 14:48:32,058 INFO [train.py:886] (3/4) Epoch 21, batch 400, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.01435, audio_tagging_loss=0.01435, over 4283288.99 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:48:33,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=638133.3333333334, ans=0.0 2023-12-22 14:48:42,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=638200.0, ans=0.0 2023-12-22 14:48:56,997 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.877e+01 2.996e+01 3.140e+01 3.707e+01, threshold=5.991e+01, percent-clipped=0.0 2023-12-22 14:49:23,553 INFO [train.py:886] (3/4) Epoch 21, batch 450, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01404, audio_tagging_loss=0.01404, over 4428253.68 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:49:42,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=638533.3333333334, ans=0.125 2023-12-22 14:49:45,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=638600.0, ans=0.2 2023-12-22 14:50:13,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=638733.3333333334, ans=0.125 2023-12-22 14:50:16,417 INFO [train.py:886] (3/4) Epoch 21, batch 500, loss[loss=0.01587, audio_tagging_loss=0.01587, over 25000.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 4541700.59 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:50:22,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=638800.0, ans=0.04949747468305833 2023-12-22 14:50:30,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-12-22 14:50:41,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.918e+01 3.045e+01 3.140e+01 3.683e+01, threshold=6.089e+01, percent-clipped=0.0 2023-12-22 14:50:43,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638933.3333333334, ans=0.1 2023-12-22 14:50:58,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-12-22 14:51:02,754 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:51:07,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.23 vs. limit=6.0 2023-12-22 14:51:07,964 INFO [train.py:886] (3/4) Epoch 21, batch 550, loss[loss=0.01098, audio_tagging_loss=0.01098, over 23983.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4637575.97 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:51:15,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-12-22 14:51:30,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=639266.6666666666, ans=0.0 2023-12-22 14:51:34,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=639266.6666666666, ans=0.0 2023-12-22 14:51:39,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=639333.3333333334, ans=0.0 2023-12-22 14:51:47,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-12-22 14:51:59,316 INFO [train.py:886] (3/4) Epoch 21, batch 600, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4708753.84 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:52:01,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=639466.6666666666, ans=0.0 2023-12-22 14:52:08,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639466.6666666666, ans=0.125 2023-12-22 14:52:14,806 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:52:24,053 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 2.947e+01 3.052e+01 3.171e+01 3.659e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 14:52:24,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=639600.0, ans=0.2 2023-12-22 14:52:26,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=639600.0, ans=0.1 2023-12-22 14:52:51,551 INFO [train.py:886] (3/4) Epoch 21, batch 650, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4752054.94 frames. ], batch size: 99, lr: 5.22e-03, grad_scale: 32.0 2023-12-22 14:53:08,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=639866.6666666666, ans=0.125 2023-12-22 14:53:14,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=639933.3333333334, ans=0.2 2023-12-22 14:53:20,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=639933.3333333334, ans=0.035 2023-12-22 14:53:38,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=640066.6666666666, ans=0.2 2023-12-22 14:53:40,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=640066.6666666666, ans=0.2 2023-12-22 14:53:45,859 INFO [train.py:886] (3/4) Epoch 21, batch 700, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01383, audio_tagging_loss=0.01383, over 4790050.61 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:53:48,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2023-12-22 14:53:58,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=640200.0, ans=0.0 2023-12-22 14:54:06,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-12-22 14:54:09,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=640266.6666666666, ans=0.0 2023-12-22 14:54:09,893 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.593e+01 2.917e+01 3.103e+01 3.196e+01 3.559e+01, threshold=6.207e+01, percent-clipped=0.0 2023-12-22 14:54:25,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=12.0 2023-12-22 14:54:37,384 INFO [train.py:886] (3/4) Epoch 21, batch 750, loss[loss=0.01402, audio_tagging_loss=0.01402, over 22682.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4822928.25 frames. ], batch size: 107, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:54:59,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=640600.0, ans=0.2 2023-12-22 14:55:19,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=640733.3333333334, ans=0.2 2023-12-22 14:55:30,040 INFO [train.py:886] (3/4) Epoch 21, batch 800, loss[loss=0.01291, audio_tagging_loss=0.01291, over 21197.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4855334.36 frames. ], batch size: 107, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:55:33,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640800.0, ans=0.1 2023-12-22 14:55:38,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=12.0 2023-12-22 14:55:42,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2023-12-22 14:55:55,170 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.577e+01 2.887e+01 3.023e+01 3.199e+01 3.565e+01, threshold=6.047e+01, percent-clipped=0.0 2023-12-22 14:56:18,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=641066.6666666666, ans=0.0 2023-12-22 14:56:21,499 INFO [train.py:886] (3/4) Epoch 21, batch 850, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4876103.75 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:56:32,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-12-22 14:56:37,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=641200.0, ans=0.125 2023-12-22 14:56:43,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=641266.6666666666, ans=0.0 2023-12-22 14:57:07,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=641400.0, ans=0.0 2023-12-22 14:57:13,897 INFO [train.py:886] (3/4) Epoch 21, batch 900, loss[loss=0.01524, audio_tagging_loss=0.01524, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4899228.23 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:57:14,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=641466.6666666666, ans=0.125 2023-12-22 14:57:15,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=641466.6666666666, ans=15.0 2023-12-22 14:57:19,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=641466.6666666666, ans=0.125 2023-12-22 14:57:25,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=641533.3333333334, ans=0.125 2023-12-22 14:57:25,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=641533.3333333334, ans=0.07 2023-12-22 14:57:31,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=641533.3333333334, ans=0.125 2023-12-22 14:57:35,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=641600.0, ans=0.125 2023-12-22 14:57:36,770 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 14:57:39,213 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 2.906e+01 3.042e+01 3.221e+01 3.641e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 14:57:47,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=641666.6666666666, ans=0.07 2023-12-22 14:57:50,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-12-22 14:58:01,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=641733.3333333334, ans=0.125 2023-12-22 14:58:04,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=641733.3333333334, ans=0.5 2023-12-22 14:58:06,039 INFO [train.py:886] (3/4) Epoch 21, batch 950, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4908664.76 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:58:07,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2023-12-22 14:58:30,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641933.3333333334, ans=0.1 2023-12-22 14:58:31,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=641933.3333333334, ans=0.2 2023-12-22 14:58:49,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=642066.6666666666, ans=0.0 2023-12-22 14:58:56,667 INFO [train.py:886] (3/4) Epoch 21, batch 1000, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4910865.32 frames. ], batch size: 99, lr: 5.21e-03, grad_scale: 32.0 2023-12-22 14:59:08,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=642200.0, ans=0.125 2023-12-22 14:59:21,803 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+01 2.900e+01 3.063e+01 3.236e+01 3.644e+01, threshold=6.126e+01, percent-clipped=0.0 2023-12-22 14:59:21,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=642266.6666666666, ans=0.2 2023-12-22 14:59:23,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=642266.6666666666, ans=0.125 2023-12-22 14:59:27,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=642333.3333333334, ans=0.0 2023-12-22 14:59:28,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=642333.3333333334, ans=0.0 2023-12-22 14:59:30,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=642333.3333333334, ans=0.125 2023-12-22 14:59:32,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=642333.3333333334, ans=0.125 2023-12-22 14:59:48,509 INFO [train.py:886] (3/4) Epoch 21, batch 1050, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4920802.86 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:00:27,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=642666.6666666666, ans=0.0 2023-12-22 15:00:40,276 INFO [train.py:886] (3/4) Epoch 21, batch 1100, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4932990.10 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:01:04,305 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 2.897e+01 3.078e+01 3.244e+01 5.460e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 15:01:09,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=642933.3333333334, ans=0.05 2023-12-22 15:01:26,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=643066.6666666666, ans=0.125 2023-12-22 15:01:27,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=643066.6666666666, ans=0.125 2023-12-22 15:01:32,007 INFO [train.py:886] (3/4) Epoch 21, batch 1150, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4939980.47 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:01:47,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=643200.0, ans=0.125 2023-12-22 15:01:47,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=643200.0, ans=0.0 2023-12-22 15:01:49,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=643200.0, ans=0.125 2023-12-22 15:01:49,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=643200.0, ans=0.125 2023-12-22 15:01:56,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=643266.6666666666, ans=0.125 2023-12-22 15:01:56,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=643266.6666666666, ans=0.125 2023-12-22 15:02:01,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=643333.3333333334, ans=0.2 2023-12-22 15:02:05,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=643333.3333333334, ans=0.2 2023-12-22 15:02:23,709 INFO [train.py:886] (3/4) Epoch 21, batch 1200, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4946114.94 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:02:26,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-22 15:02:30,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=643466.6666666666, ans=0.0 2023-12-22 15:02:43,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=643600.0, ans=0.125 2023-12-22 15:02:47,767 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 2.913e+01 3.055e+01 3.244e+01 3.742e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 15:02:58,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=643666.6666666666, ans=0.0 2023-12-22 15:02:59,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643666.6666666666, ans=0.1 2023-12-22 15:03:09,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=643733.3333333334, ans=0.125 2023-12-22 15:03:14,501 INFO [train.py:886] (3/4) Epoch 21, batch 1250, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4945596.52 frames. ], batch size: 99, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:03:20,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=12.0 2023-12-22 15:03:33,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=643866.6666666666, ans=0.0 2023-12-22 15:03:34,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-22 15:03:42,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=643933.3333333334, ans=0.2 2023-12-22 15:03:45,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644000.0, ans=0.1 2023-12-22 15:03:52,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644000.0, ans=0.1 2023-12-22 15:03:59,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=644066.6666666666, ans=0.0 2023-12-22 15:04:03,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=644066.6666666666, ans=0.2 2023-12-22 15:04:07,497 INFO [train.py:886] (3/4) Epoch 21, batch 1300, loss[loss=0.0138, audio_tagging_loss=0.0138, over 25000.00 frames. ], tot_loss[loss=0.01365, audio_tagging_loss=0.01365, over 4937352.01 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:04:08,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=644133.3333333334, ans=0.125 2023-12-22 15:04:23,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=644200.0, ans=0.125 2023-12-22 15:04:33,041 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.559e+01 2.961e+01 3.123e+01 3.298e+01 3.701e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 15:04:59,027 INFO [train.py:886] (3/4) Epoch 21, batch 1350, loss[loss=0.01151, audio_tagging_loss=0.01151, over 23987.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4935401.78 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2023-12-22 15:05:17,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-12-22 15:05:38,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=644666.6666666666, ans=0.0 2023-12-22 15:05:47,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=644733.3333333334, ans=0.2 2023-12-22 15:05:50,489 INFO [train.py:886] (3/4) Epoch 21, batch 1400, loss[loss=0.01531, audio_tagging_loss=0.01531, over 22118.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4935783.72 frames. ], batch size: 107, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:05:58,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644800.0, ans=0.1 2023-12-22 15:06:03,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644866.6666666666, ans=0.125 2023-12-22 15:06:07,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=644866.6666666666, ans=0.09899494936611666 2023-12-22 15:06:15,741 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.531e+01 2.918e+01 3.034e+01 3.190e+01 3.697e+01, threshold=6.068e+01, percent-clipped=0.0 2023-12-22 15:06:29,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=645000.0, ans=0.125 2023-12-22 15:06:40,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-22 15:06:43,067 INFO [train.py:886] (3/4) Epoch 21, batch 1450, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4942585.22 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:06:43,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=645133.3333333334, ans=0.0 2023-12-22 15:06:50,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-12-22 15:07:02,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=645266.6666666666, ans=0.0 2023-12-22 15:07:33,413 INFO [train.py:886] (3/4) Epoch 21, batch 1500, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4945234.79 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:07:51,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=645533.3333333334, ans=0.125 2023-12-22 15:07:52,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-22 15:07:56,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=645600.0, ans=0.0 2023-12-22 15:07:59,016 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.580e+01 2.868e+01 3.009e+01 3.172e+01 3.976e+01, threshold=6.018e+01, percent-clipped=0.0 2023-12-22 15:07:59,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=645600.0, ans=0.125 2023-12-22 15:08:05,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-12-22 15:08:13,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2023-12-22 15:08:26,459 INFO [train.py:886] (3/4) Epoch 21, batch 1550, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4949572.16 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:08:29,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645800.0, ans=0.1 2023-12-22 15:08:30,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.84 vs. limit=12.0 2023-12-22 15:08:30,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645800.0, ans=0.1 2023-12-22 15:08:52,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 15:08:56,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=645933.3333333334, ans=0.2 2023-12-22 15:08:56,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=645933.3333333334, ans=0.125 2023-12-22 15:09:11,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=646066.6666666666, ans=0.125 2023-12-22 15:09:18,963 INFO [train.py:886] (3/4) Epoch 21, batch 1600, loss[loss=0.01456, audio_tagging_loss=0.01456, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4948638.60 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:09:23,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=646133.3333333334, ans=0.125 2023-12-22 15:09:29,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=646200.0, ans=0.125 2023-12-22 15:09:31,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646200.0, ans=0.125 2023-12-22 15:09:32,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=646200.0, ans=0.125 2023-12-22 15:09:33,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646200.0, ans=0.1 2023-12-22 15:09:35,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2023-12-22 15:09:42,936 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+01 3.012e+01 3.134e+01 3.270e+01 4.139e+01, threshold=6.268e+01, percent-clipped=0.0 2023-12-22 15:09:47,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=646266.6666666666, ans=0.125 2023-12-22 15:09:58,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.27 vs. limit=15.0 2023-12-22 15:10:09,630 INFO [train.py:886] (3/4) Epoch 21, batch 1650, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24035.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4943753.78 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:10:37,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646600.0, ans=0.125 2023-12-22 15:11:02,001 INFO [train.py:886] (3/4) Epoch 21, batch 1700, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4946184.85 frames. ], batch size: 99, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:11:27,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+01 2.927e+01 3.026e+01 3.153e+01 3.833e+01, threshold=6.051e+01, percent-clipped=0.0 2023-12-22 15:11:29,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=646933.3333333334, ans=0.0 2023-12-22 15:11:49,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=647066.6666666666, ans=0.1 2023-12-22 15:11:54,528 INFO [train.py:886] (3/4) Epoch 21, batch 1750, loss[loss=0.0148, audio_tagging_loss=0.0148, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4951524.10 frames. ], batch size: 100, lr: 5.19e-03, grad_scale: 32.0 2023-12-22 15:12:00,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-12-22 15:12:03,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=647133.3333333334, ans=0.0 2023-12-22 15:12:17,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=647266.6666666666, ans=0.125 2023-12-22 15:12:34,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=647333.3333333334, ans=0.125 2023-12-22 15:12:40,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=647400.0, ans=0.125 2023-12-22 15:12:45,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=647466.6666666666, ans=0.125 2023-12-22 15:12:46,351 INFO [train.py:886] (3/4) Epoch 21, batch 1800, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4947730.49 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:12:49,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647466.6666666666, ans=0.0 2023-12-22 15:12:54,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=647466.6666666666, ans=0.0 2023-12-22 15:12:56,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=647533.3333333334, ans=0.2 2023-12-22 15:13:03,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-22 15:13:11,092 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.946e+01 3.053e+01 3.147e+01 3.589e+01, threshold=6.107e+01, percent-clipped=0.0 2023-12-22 15:13:14,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=647600.0, ans=0.125 2023-12-22 15:13:15,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2023-12-22 15:13:24,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=647666.6666666666, ans=0.125 2023-12-22 15:13:32,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=647733.3333333334, ans=0.125 2023-12-22 15:13:36,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-22 15:13:38,498 INFO [train.py:886] (3/4) Epoch 21, batch 1850, loss[loss=0.01476, audio_tagging_loss=0.01476, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4952095.22 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:13:41,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=647800.0, ans=0.2 2023-12-22 15:13:42,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=12.0 2023-12-22 15:14:14,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=648000.0, ans=0.125 2023-12-22 15:14:14,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=648000.0, ans=0.125 2023-12-22 15:14:17,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-12-22 15:14:20,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=648066.6666666666, ans=0.0 2023-12-22 15:14:21,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=648066.6666666666, ans=0.0 2023-12-22 15:14:26,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=648066.6666666666, ans=0.125 2023-12-22 15:14:28,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=648133.3333333334, ans=0.0 2023-12-22 15:14:29,464 INFO [train.py:886] (3/4) Epoch 21, batch 1900, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4947000.41 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:14:53,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=648266.6666666666, ans=0.125 2023-12-22 15:14:54,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+01 2.961e+01 3.101e+01 3.299e+01 3.976e+01, threshold=6.202e+01, percent-clipped=0.0 2023-12-22 15:15:03,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=648333.3333333334, ans=0.0 2023-12-22 15:15:11,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=648400.0, ans=0.95 2023-12-22 15:15:21,528 INFO [train.py:886] (3/4) Epoch 21, batch 1950, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4946884.29 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 32.0 2023-12-22 15:15:40,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=648533.3333333334, ans=0.125 2023-12-22 15:16:13,196 INFO [train.py:886] (3/4) Epoch 21, batch 2000, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4949437.50 frames. ], batch size: 99, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:16:37,593 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.920e+01 3.050e+01 3.228e+01 3.734e+01, threshold=6.100e+01, percent-clipped=0.0 2023-12-22 15:16:40,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=648933.3333333334, ans=0.04949747468305833 2023-12-22 15:16:50,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-12-22 15:17:03,725 INFO [train.py:886] (3/4) Epoch 21, batch 2050, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4948504.78 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:17:08,490 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:17:15,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=649200.0, ans=0.07 2023-12-22 15:17:29,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=649266.6666666666, ans=0.0 2023-12-22 15:17:31,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649266.6666666666, ans=0.1 2023-12-22 15:17:36,180 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:17:51,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=649400.0, ans=0.1 2023-12-22 15:17:56,871 INFO [train.py:886] (3/4) Epoch 21, batch 2100, loss[loss=0.0146, audio_tagging_loss=0.0146, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4957695.92 frames. ], batch size: 100, lr: 5.18e-03, grad_scale: 64.0 2023-12-22 15:18:03,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=649466.6666666666, ans=0.2 2023-12-22 15:18:15,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=22.5 2023-12-22 15:18:21,961 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.928e+01 3.055e+01 3.207e+01 3.778e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:18:40,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.21 vs. limit=22.5 2023-12-22 15:18:43,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2023-12-22 15:18:45,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=649733.3333333334, ans=0.2 2023-12-22 15:18:47,492 INFO [train.py:886] (3/4) Epoch 21, batch 2150, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4959700.75 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:18:47,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=649800.0, ans=0.2 2023-12-22 15:18:57,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=649866.6666666666, ans=0.015 2023-12-22 15:19:01,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=649866.6666666666, ans=0.05 2023-12-22 15:19:15,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=649933.3333333334, ans=0.0 2023-12-22 15:19:20,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=650000.0, ans=0.0 2023-12-22 15:19:28,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=12.0 2023-12-22 15:19:38,970 INFO [train.py:886] (3/4) Epoch 21, batch 2200, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24063.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4956466.64 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:19:43,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650133.3333333334, ans=0.125 2023-12-22 15:20:04,332 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+01 2.983e+01 3.116e+01 3.285e+01 3.791e+01, threshold=6.231e+01, percent-clipped=0.0 2023-12-22 15:20:06,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=650266.6666666666, ans=0.125 2023-12-22 15:20:13,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650333.3333333334, ans=0.1 2023-12-22 15:20:25,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-12-22 15:20:30,692 INFO [train.py:886] (3/4) Epoch 21, batch 2250, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24044.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4949538.52 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:20:39,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=12.0 2023-12-22 15:20:39,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-12-22 15:20:53,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-12-22 15:21:07,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=650666.6666666666, ans=0.125 2023-12-22 15:21:09,052 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:21:22,070 INFO [train.py:886] (3/4) Epoch 21, batch 2300, loss[loss=0.01345, audio_tagging_loss=0.01345, over 22875.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4948939.90 frames. ], batch size: 107, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:21:22,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=650800.0, ans=0.0 2023-12-22 15:21:35,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=650866.6666666666, ans=0.2 2023-12-22 15:21:42,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-12-22 15:21:46,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 2.891e+01 3.019e+01 3.134e+01 3.586e+01, threshold=6.037e+01, percent-clipped=0.0 2023-12-22 15:21:55,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=651000.0, ans=0.0 2023-12-22 15:22:04,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=651066.6666666666, ans=0.125 2023-12-22 15:22:05,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651066.6666666666, ans=0.125 2023-12-22 15:22:08,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=651066.6666666666, ans=0.125 2023-12-22 15:22:11,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-12-22 15:22:14,363 INFO [train.py:886] (3/4) Epoch 21, batch 2350, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4950713.56 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:22:20,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-22 15:22:24,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=651200.0, ans=0.125 2023-12-22 15:22:28,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=651200.0, ans=0.125 2023-12-22 15:22:40,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-12-22 15:22:40,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2023-12-22 15:22:42,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=651266.6666666666, ans=0.125 2023-12-22 15:22:46,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=651333.3333333334, ans=0.125 2023-12-22 15:22:56,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.46 vs. limit=10.0 2023-12-22 15:23:03,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=651466.6666666666, ans=0.0 2023-12-22 15:23:05,388 INFO [train.py:886] (3/4) Epoch 21, batch 2400, loss[loss=0.01456, audio_tagging_loss=0.01456, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4954349.76 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:23:30,198 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.897e+01 3.019e+01 3.181e+01 3.470e+01, threshold=6.039e+01, percent-clipped=0.0 2023-12-22 15:23:41,318 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:23:41,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-22 15:23:56,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=651733.3333333334, ans=0.2 2023-12-22 15:23:57,920 INFO [train.py:886] (3/4) Epoch 21, batch 2450, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4958956.03 frames. ], batch size: 100, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:24:00,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=651800.0, ans=10.0 2023-12-22 15:24:07,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=651866.6666666666, ans=0.0 2023-12-22 15:24:15,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651866.6666666666, ans=0.125 2023-12-22 15:24:15,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=651866.6666666666, ans=0.09899494936611666 2023-12-22 15:24:16,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=651866.6666666666, ans=0.2 2023-12-22 15:24:25,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=651933.3333333334, ans=0.2 2023-12-22 15:24:35,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=652000.0, ans=0.0 2023-12-22 15:24:49,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=652133.3333333334, ans=0.0 2023-12-22 15:24:50,607 INFO [train.py:886] (3/4) Epoch 21, batch 2500, loss[loss=0.01759, audio_tagging_loss=0.01759, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4959951.02 frames. ], batch size: 99, lr: 5.17e-03, grad_scale: 64.0 2023-12-22 15:24:57,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=652133.3333333334, ans=0.035 2023-12-22 15:24:58,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652133.3333333334, ans=0.125 2023-12-22 15:25:11,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-22 15:25:14,692 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.708e+01 3.022e+01 3.140e+01 3.250e+01 3.693e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 15:25:15,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652266.6666666666, ans=0.0 2023-12-22 15:25:28,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=652333.3333333334, ans=0.125 2023-12-22 15:25:35,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-22 15:25:40,921 INFO [train.py:886] (3/4) Epoch 21, batch 2550, loss[loss=0.01217, audio_tagging_loss=0.01217, over 23990.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4950468.55 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:25:44,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:25:59,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=652533.3333333334, ans=0.2 2023-12-22 15:26:02,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652600.0, ans=0.1 2023-12-22 15:26:02,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=652600.0, ans=0.125 2023-12-22 15:26:06,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=652600.0, ans=0.0 2023-12-22 15:26:18,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=652666.6666666666, ans=0.125 2023-12-22 15:26:34,207 INFO [train.py:886] (3/4) Epoch 21, batch 2600, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4945144.37 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:26:53,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-12-22 15:26:58,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+01 2.930e+01 3.065e+01 3.223e+01 3.938e+01, threshold=6.130e+01, percent-clipped=0.0 2023-12-22 15:27:00,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=652933.3333333334, ans=0.125 2023-12-22 15:27:19,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=653066.6666666666, ans=0.125 2023-12-22 15:27:26,040 INFO [train.py:886] (3/4) Epoch 21, batch 2650, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4948334.58 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:27:26,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=653133.3333333334, ans=0.0 2023-12-22 15:27:40,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=653200.0, ans=0.0 2023-12-22 15:27:46,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2023-12-22 15:28:13,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=653400.0, ans=0.0 2023-12-22 15:28:17,703 INFO [train.py:886] (3/4) Epoch 21, batch 2700, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4944771.67 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:28:21,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=653466.6666666666, ans=0.125 2023-12-22 15:28:37,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=653533.3333333334, ans=0.1 2023-12-22 15:28:37,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 15:28:38,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=653600.0, ans=0.0 2023-12-22 15:28:42,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=653600.0, ans=0.1 2023-12-22 15:28:43,279 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.932e+01 3.064e+01 3.237e+01 3.661e+01, threshold=6.127e+01, percent-clipped=0.0 2023-12-22 15:28:46,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.06 vs. limit=22.5 2023-12-22 15:28:53,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=653666.6666666666, ans=0.125 2023-12-22 15:28:59,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=653733.3333333334, ans=0.125 2023-12-22 15:29:10,335 INFO [train.py:886] (3/4) Epoch 21, batch 2750, loss[loss=0.01483, audio_tagging_loss=0.01483, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4948350.31 frames. ], batch size: 100, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:29:21,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=653866.6666666666, ans=0.0 2023-12-22 15:29:26,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653866.6666666666, ans=0.1 2023-12-22 15:29:32,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-12-22 15:30:02,222 INFO [train.py:886] (3/4) Epoch 21, batch 2800, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24750.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4950883.54 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:30:02,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.71 vs. limit=10.0 2023-12-22 15:30:10,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=654133.3333333334, ans=0.04949747468305833 2023-12-22 15:30:10,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-12-22 15:30:15,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=654200.0, ans=0.0 2023-12-22 15:30:17,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=654200.0, ans=0.125 2023-12-22 15:30:22,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=654266.6666666666, ans=0.125 2023-12-22 15:30:25,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654266.6666666666, ans=0.125 2023-12-22 15:30:26,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:30:26,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=654266.6666666666, ans=0.0 2023-12-22 15:30:26,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.655e+01 2.987e+01 3.081e+01 3.261e+01 3.744e+01, threshold=6.161e+01, percent-clipped=0.0 2023-12-22 15:30:39,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=654333.3333333334, ans=0.125 2023-12-22 15:30:39,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=654333.3333333334, ans=0.125 2023-12-22 15:30:40,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=654333.3333333334, ans=10.0 2023-12-22 15:30:44,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-12-22 15:30:54,057 INFO [train.py:886] (3/4) Epoch 21, batch 2850, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4949617.17 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:31:12,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2023-12-22 15:31:30,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-12-22 15:31:42,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=654733.3333333334, ans=0.0 2023-12-22 15:31:46,821 INFO [train.py:886] (3/4) Epoch 21, batch 2900, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4948408.58 frames. ], batch size: 99, lr: 5.16e-03, grad_scale: 64.0 2023-12-22 15:31:49,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=654800.0, ans=0.0 2023-12-22 15:31:51,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=654800.0, ans=0.0 2023-12-22 15:31:55,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654866.6666666666, ans=0.1 2023-12-22 15:32:04,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=654866.6666666666, ans=0.0 2023-12-22 15:32:11,068 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+01 2.895e+01 3.036e+01 3.201e+01 4.104e+01, threshold=6.072e+01, percent-clipped=0.0 2023-12-22 15:32:22,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=655000.0, ans=0.1 2023-12-22 15:32:37,563 INFO [train.py:886] (3/4) Epoch 21, batch 2950, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4950879.60 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:32:49,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655200.0, ans=0.125 2023-12-22 15:32:49,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655200.0, ans=0.1 2023-12-22 15:33:19,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:33:19,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655400.0, ans=0.1 2023-12-22 15:33:29,607 INFO [train.py:886] (3/4) Epoch 21, batch 3000, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4953609.69 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:33:29,607 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 15:33:43,150 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3174, 4.5926, 5.1881, 4.6812], device='cuda:3') 2023-12-22 15:33:50,877 INFO [train.py:917] (3/4) Epoch 21, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 15:33:50,877 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 15:34:04,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=655533.3333333334, ans=0.2 2023-12-22 15:34:08,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655533.3333333334, ans=0.1 2023-12-22 15:34:11,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=655600.0, ans=0.0 2023-12-22 15:34:14,635 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.605e+01 2.876e+01 3.036e+01 3.151e+01 3.734e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 15:34:25,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=655666.6666666666, ans=0.2 2023-12-22 15:34:41,393 INFO [train.py:886] (3/4) Epoch 21, batch 3050, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4954979.42 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:35:14,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=656000.0, ans=0.125 2023-12-22 15:35:21,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=656000.0, ans=0.0 2023-12-22 15:35:33,733 INFO [train.py:886] (3/4) Epoch 21, batch 3100, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4950624.01 frames. ], batch size: 100, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:35:58,326 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.673e+01 2.955e+01 3.066e+01 3.256e+01 3.692e+01, threshold=6.132e+01, percent-clipped=0.0 2023-12-22 15:36:25,924 INFO [train.py:886] (3/4) Epoch 21, batch 3150, loss[loss=0.0124, audio_tagging_loss=0.0124, over 21918.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4946642.55 frames. ], batch size: 107, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:36:51,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=656600.0, ans=0.125 2023-12-22 15:36:56,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=656666.6666666666, ans=0.125 2023-12-22 15:37:06,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-22 15:37:16,937 INFO [train.py:886] (3/4) Epoch 21, batch 3200, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4945724.51 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:37:21,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=656800.0, ans=0.125 2023-12-22 15:37:28,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=656866.6666666666, ans=0.025 2023-12-22 15:37:41,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=656933.3333333334, ans=0.125 2023-12-22 15:37:42,409 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 2.933e+01 3.051e+01 3.239e+01 4.108e+01, threshold=6.103e+01, percent-clipped=0.0 2023-12-22 15:37:49,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=657000.0, ans=0.125 2023-12-22 15:37:51,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=657000.0, ans=0.125 2023-12-22 15:37:52,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=657000.0, ans=0.0 2023-12-22 15:37:53,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2023-12-22 15:38:01,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=657066.6666666666, ans=0.2 2023-12-22 15:38:09,719 INFO [train.py:886] (3/4) Epoch 21, batch 3250, loss[loss=0.01098, audio_tagging_loss=0.01098, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4947884.56 frames. ], batch size: 99, lr: 5.15e-03, grad_scale: 64.0 2023-12-22 15:38:09,979 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:38:21,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.36 vs. limit=22.5 2023-12-22 15:39:00,437 INFO [train.py:886] (3/4) Epoch 21, batch 3300, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4948889.34 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:39:00,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=657466.6666666666, ans=0.5 2023-12-22 15:39:14,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=657533.3333333334, ans=0.1 2023-12-22 15:39:23,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=657600.0, ans=0.09899494936611666 2023-12-22 15:39:24,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=657600.0, ans=0.0 2023-12-22 15:39:24,986 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+01 2.897e+01 3.042e+01 3.162e+01 3.785e+01, threshold=6.083e+01, percent-clipped=0.0 2023-12-22 15:39:29,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=657600.0, ans=0.125 2023-12-22 15:39:38,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657666.6666666666, ans=0.1 2023-12-22 15:39:51,621 INFO [train.py:886] (3/4) Epoch 21, batch 3350, loss[loss=0.009357, audio_tagging_loss=0.009357, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4955536.72 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:40:08,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-12-22 15:40:22,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=658000.0, ans=0.125 2023-12-22 15:40:32,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658066.6666666666, ans=0.125 2023-12-22 15:40:43,797 INFO [train.py:886] (3/4) Epoch 21, batch 3400, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4958999.05 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:40:57,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=658200.0, ans=0.125 2023-12-22 15:41:04,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-12-22 15:41:07,159 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 2.968e+01 3.084e+01 3.241e+01 3.911e+01, threshold=6.167e+01, percent-clipped=0.0 2023-12-22 15:41:07,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-22 15:41:34,384 INFO [train.py:886] (3/4) Epoch 21, batch 3450, loss[loss=0.01471, audio_tagging_loss=0.01471, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4958264.93 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:42:27,469 INFO [train.py:886] (3/4) Epoch 21, batch 3500, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4946430.51 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:42:42,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=658866.6666666666, ans=0.07 2023-12-22 15:42:51,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=658933.3333333334, ans=0.125 2023-12-22 15:42:52,328 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.914e+01 3.083e+01 3.218e+01 3.665e+01, threshold=6.166e+01, percent-clipped=0.0 2023-12-22 15:43:00,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=659000.0, ans=0.0 2023-12-22 15:43:01,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-12-22 15:43:18,618 INFO [train.py:886] (3/4) Epoch 21, batch 3550, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4947911.93 frames. ], batch size: 99, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:43:42,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=659266.6666666666, ans=0.2 2023-12-22 15:44:05,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.99 vs. limit=15.0 2023-12-22 15:44:06,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=659400.0, ans=0.0 2023-12-22 15:44:10,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-12-22 15:44:10,681 INFO [train.py:886] (3/4) Epoch 21, batch 3600, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4952949.86 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:44:10,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=659466.6666666666, ans=0.125 2023-12-22 15:44:36,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+01 2.979e+01 3.108e+01 3.250e+01 3.657e+01, threshold=6.215e+01, percent-clipped=0.0 2023-12-22 15:44:41,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=659666.6666666666, ans=0.2 2023-12-22 15:44:59,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=659733.3333333334, ans=0.125 2023-12-22 15:45:01,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2023-12-22 15:45:02,663 INFO [train.py:886] (3/4) Epoch 21, batch 3650, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4957459.91 frames. ], batch size: 100, lr: 5.14e-03, grad_scale: 64.0 2023-12-22 15:45:23,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=659933.3333333334, ans=0.125 2023-12-22 15:45:25,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=659933.3333333334, ans=0.125 2023-12-22 15:45:26,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=659933.3333333334, ans=0.125 2023-12-22 15:45:37,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=660000.0, ans=0.05 2023-12-22 15:45:41,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=12.0 2023-12-22 15:45:54,401 INFO [train.py:886] (3/4) Epoch 21, batch 3700, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4961418.07 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:46:04,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=660200.0, ans=0.125 2023-12-22 15:46:06,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660200.0, ans=0.125 2023-12-22 15:46:19,996 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 2.928e+01 3.055e+01 3.227e+01 3.842e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 15:46:47,451 INFO [train.py:886] (3/4) Epoch 21, batch 3750, loss[loss=0.01389, audio_tagging_loss=0.01389, over 24945.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4959323.96 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:47:03,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660533.3333333334, ans=0.1 2023-12-22 15:47:07,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=15.0 2023-12-22 15:47:14,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=660600.0, ans=0.125 2023-12-22 15:47:16,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=660600.0, ans=0.125 2023-12-22 15:47:21,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=660666.6666666666, ans=0.125 2023-12-22 15:47:27,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=660733.3333333334, ans=0.125 2023-12-22 15:47:32,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=660733.3333333334, ans=0.125 2023-12-22 15:47:38,472 INFO [train.py:886] (3/4) Epoch 21, batch 3800, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4952935.49 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:47:49,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660866.6666666666, ans=0.125 2023-12-22 15:48:03,536 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.617e+01 3.001e+01 3.115e+01 3.242e+01 4.083e+01, threshold=6.229e+01, percent-clipped=0.0 2023-12-22 15:48:10,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661000.0, ans=0.0 2023-12-22 15:48:10,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=661000.0, ans=0.0 2023-12-22 15:48:20,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=661066.6666666666, ans=0.125 2023-12-22 15:48:27,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=661066.6666666666, ans=0.125 2023-12-22 15:48:28,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=661066.6666666666, ans=0.2 2023-12-22 15:48:30,948 INFO [train.py:886] (3/4) Epoch 21, batch 3850, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4949372.13 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:48:32,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2023-12-22 15:48:36,936 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:49:23,643 INFO [train.py:886] (3/4) Epoch 21, batch 3900, loss[loss=0.01634, audio_tagging_loss=0.01634, over 22121.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4945435.12 frames. ], batch size: 107, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:49:27,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-12-22 15:49:36,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=661533.3333333334, ans=0.0 2023-12-22 15:49:47,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661600.0, ans=0.0 2023-12-22 15:49:47,903 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 2.909e+01 3.084e+01 3.230e+01 3.604e+01, threshold=6.168e+01, percent-clipped=0.0 2023-12-22 15:49:50,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661600.0, ans=0.0 2023-12-22 15:50:00,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=661666.6666666666, ans=0.125 2023-12-22 15:50:01,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=661666.6666666666, ans=0.0 2023-12-22 15:50:07,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2023-12-22 15:50:14,918 INFO [train.py:886] (3/4) Epoch 21, batch 3950, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4945386.50 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:50:21,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2023-12-22 15:50:26,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-12-22 15:50:34,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=661933.3333333334, ans=0.125 2023-12-22 15:50:35,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=661933.3333333334, ans=0.0 2023-12-22 15:50:44,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=661933.3333333334, ans=0.2 2023-12-22 15:50:47,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=662000.0, ans=0.09899494936611666 2023-12-22 15:51:07,208 INFO [train.py:886] (3/4) Epoch 21, batch 4000, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4948086.59 frames. ], batch size: 100, lr: 5.13e-03, grad_scale: 128.0 2023-12-22 15:51:07,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662133.3333333334, ans=0.1 2023-12-22 15:51:14,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=662133.3333333334, ans=0.125 2023-12-22 15:51:16,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662200.0, ans=0.1 2023-12-22 15:51:20,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:51:26,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662200.0, ans=0.125 2023-12-22 15:51:30,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2023-12-22 15:51:33,826 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.648e+01 2.959e+01 3.062e+01 3.233e+01 3.752e+01, threshold=6.123e+01, percent-clipped=0.0 2023-12-22 15:51:48,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662400.0, ans=0.125 2023-12-22 15:51:59,472 INFO [train.py:886] (3/4) Epoch 21, batch 4050, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 4942789.68 frames. ], batch size: 99, lr: 5.13e-03, grad_scale: 64.0 2023-12-22 15:52:15,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662533.3333333334, ans=0.125 2023-12-22 15:52:21,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=662600.0, ans=0.0 2023-12-22 15:52:32,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=662666.6666666666, ans=0.125 2023-12-22 15:52:43,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-12-22 15:52:48,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662733.3333333334, ans=0.125 2023-12-22 15:52:51,199 INFO [train.py:886] (3/4) Epoch 21, batch 4100, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24055.00 frames. ], tot_loss[loss=0.01376, audio_tagging_loss=0.01376, over 4941411.98 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:52:51,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-12-22 15:52:56,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=662800.0, ans=0.125 2023-12-22 15:53:05,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=662866.6666666666, ans=0.125 2023-12-22 15:53:06,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=662866.6666666666, ans=0.0 2023-12-22 15:53:11,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=662933.3333333334, ans=0.0 2023-12-22 15:53:17,033 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 2.959e+01 3.122e+01 3.290e+01 3.671e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 15:53:20,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2023-12-22 15:53:28,567 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:53:32,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663066.6666666666, ans=0.125 2023-12-22 15:53:38,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=663066.6666666666, ans=0.125 2023-12-22 15:53:43,070 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:53:43,775 INFO [train.py:886] (3/4) Epoch 21, batch 4150, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4941429.09 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:53:58,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.11 vs. limit=6.0 2023-12-22 15:54:08,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=663266.6666666666, ans=10.0 2023-12-22 15:54:17,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=663333.3333333334, ans=0.125 2023-12-22 15:54:30,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=663400.0, ans=0.0 2023-12-22 15:54:34,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=663466.6666666666, ans=0.05 2023-12-22 15:54:35,388 INFO [train.py:886] (3/4) Epoch 21, batch 4200, loss[loss=0.0141, audio_tagging_loss=0.0141, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4942499.54 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:54:37,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-12-22 15:54:46,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2023-12-22 15:54:56,000 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:55:00,453 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.926e+01 3.050e+01 3.273e+01 3.755e+01, threshold=6.101e+01, percent-clipped=0.0 2023-12-22 15:55:05,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663666.6666666666, ans=0.1 2023-12-22 15:55:12,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-22 15:55:27,270 INFO [train.py:886] (3/4) Epoch 21, batch 4250, loss[loss=0.01356, audio_tagging_loss=0.01356, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4944256.71 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:55:27,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.56 vs. limit=22.5 2023-12-22 15:55:29,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=663800.0, ans=0.125 2023-12-22 15:55:45,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 15:55:51,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663933.3333333334, ans=0.1 2023-12-22 15:55:51,306 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.899e-02 2023-12-22 15:55:53,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=663933.3333333334, ans=0.0 2023-12-22 15:56:02,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=664000.0, ans=0.125 2023-12-22 15:56:08,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-22 15:56:16,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=12.0 2023-12-22 15:56:20,057 INFO [train.py:886] (3/4) Epoch 21, batch 4300, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4946496.69 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:56:22,197 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.066e-02 2023-12-22 15:56:28,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=664133.3333333334, ans=0.125 2023-12-22 15:56:29,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2023-12-22 15:56:37,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=664200.0, ans=0.125 2023-12-22 15:56:38,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=12.0 2023-12-22 15:56:40,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=664266.6666666666, ans=0.2 2023-12-22 15:56:43,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-12-22 15:56:45,829 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+01 2.943e+01 3.122e+01 3.224e+01 4.020e+01, threshold=6.245e+01, percent-clipped=0.0 2023-12-22 15:56:47,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=664266.6666666666, ans=0.125 2023-12-22 15:57:03,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=664400.0, ans=0.125 2023-12-22 15:57:10,847 INFO [train.py:886] (3/4) Epoch 21, batch 4350, loss[loss=0.01421, audio_tagging_loss=0.01421, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4942979.73 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:57:12,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=664466.6666666666, ans=15.0 2023-12-22 15:57:18,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=664466.6666666666, ans=0.125 2023-12-22 15:57:19,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-12-22 15:57:22,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=664533.3333333334, ans=0.125 2023-12-22 15:57:26,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=664533.3333333334, ans=0.1 2023-12-22 15:57:30,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664533.3333333334, ans=0.1 2023-12-22 15:57:38,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=664600.0, ans=0.0 2023-12-22 15:57:45,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=664666.6666666666, ans=0.2 2023-12-22 15:58:03,329 INFO [train.py:886] (3/4) Epoch 21, batch 4400, loss[loss=0.01477, audio_tagging_loss=0.01477, over 24750.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4946984.25 frames. ], batch size: 99, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:58:13,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=664866.6666666666, ans=0.125 2023-12-22 15:58:16,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=664866.6666666666, ans=0.2 2023-12-22 15:58:29,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.790e+01 3.057e+01 3.154e+01 3.271e+01 4.005e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 15:58:35,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=665000.0, ans=0.05 2023-12-22 15:58:39,745 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 15:58:41,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-12-22 15:58:42,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.32 vs. limit=10.0 2023-12-22 15:58:55,003 INFO [train.py:886] (3/4) Epoch 21, batch 4450, loss[loss=0.01521, audio_tagging_loss=0.01521, over 23975.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4945698.01 frames. ], batch size: 100, lr: 5.12e-03, grad_scale: 64.0 2023-12-22 15:59:00,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=665133.3333333334, ans=0.125 2023-12-22 15:59:04,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=665133.3333333334, ans=0.125 2023-12-22 15:59:07,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=665200.0, ans=0.125 2023-12-22 15:59:10,972 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.209e-01 2023-12-22 15:59:19,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2023-12-22 15:59:22,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665266.6666666666, ans=0.125 2023-12-22 15:59:31,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=665333.3333333334, ans=0.0 2023-12-22 15:59:33,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-12-22 15:59:41,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=665400.0, ans=0.2 2023-12-22 15:59:46,935 INFO [train.py:886] (3/4) Epoch 21, batch 4500, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4946500.59 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 15:59:57,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=665533.3333333334, ans=0.07 2023-12-22 16:00:03,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=22.5 2023-12-22 16:00:12,610 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 2.928e+01 3.056e+01 3.221e+01 3.659e+01, threshold=6.113e+01, percent-clipped=0.0 2023-12-22 16:00:31,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=665733.3333333334, ans=0.125 2023-12-22 16:00:39,018 INFO [train.py:886] (3/4) Epoch 21, batch 4550, loss[loss=0.01226, audio_tagging_loss=0.01226, over 21695.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4942656.82 frames. ], batch size: 107, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:00:41,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-22 16:00:44,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=665800.0, ans=0.125 2023-12-22 16:00:50,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=665866.6666666666, ans=0.125 2023-12-22 16:01:19,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666066.6666666666, ans=0.125 2023-12-22 16:01:29,219 INFO [train.py:886] (3/4) Epoch 21, batch 4600, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4943717.64 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:01:43,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=666200.0, ans=0.125 2023-12-22 16:01:52,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=666266.6666666666, ans=0.125 2023-12-22 16:01:55,669 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 2.941e+01 3.039e+01 3.146e+01 3.835e+01, threshold=6.079e+01, percent-clipped=0.0 2023-12-22 16:01:58,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=666266.6666666666, ans=0.125 2023-12-22 16:01:59,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666333.3333333334, ans=0.125 2023-12-22 16:02:00,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=666333.3333333334, ans=0.0 2023-12-22 16:02:13,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=666400.0, ans=0.125 2023-12-22 16:02:20,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=666400.0, ans=0.0 2023-12-22 16:02:21,758 INFO [train.py:886] (3/4) Epoch 21, batch 4650, loss[loss=0.01457, audio_tagging_loss=0.01457, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4947687.61 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:02:43,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=666600.0, ans=0.125 2023-12-22 16:02:50,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=12.0 2023-12-22 16:02:55,417 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.065e-01 2023-12-22 16:03:02,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=666666.6666666666, ans=0.0 2023-12-22 16:03:13,764 INFO [train.py:886] (3/4) Epoch 21, batch 4700, loss[loss=0.01164, audio_tagging_loss=0.01164, over 22431.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4944960.39 frames. ], batch size: 107, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:03:30,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666866.6666666666, ans=0.0 2023-12-22 16:03:37,514 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.014e+01 3.141e+01 3.308e+01 3.967e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 16:03:47,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-22 16:03:48,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667000.0, ans=0.125 2023-12-22 16:03:53,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-12-22 16:03:57,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=667066.6666666666, ans=0.0 2023-12-22 16:04:01,454 INFO [train.py:886] (3/4) Epoch 21, batch 4750, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01359, audio_tagging_loss=0.01359, over 4944426.64 frames. ], batch size: 99, lr: 5.11e-03, grad_scale: 64.0 2023-12-22 16:04:07,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=667133.3333333334, ans=0.0 2023-12-22 16:04:09,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=667200.0, ans=0.0 2023-12-22 16:04:35,438 INFO [train.py:886] (3/4) Epoch 22, batch 0, loss[loss=0.0271, audio_tagging_loss=0.0271, over 24015.00 frames. ], tot_loss[loss=0.0271, audio_tagging_loss=0.0271, over 24015.00 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 64.0 2023-12-22 16:04:35,439 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 16:04:49,035 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6976, 2.5945, 2.7764, 2.3256, 3.8492, 3.4556, 4.0404, 2.2836], device='cuda:3') 2023-12-22 16:04:55,980 INFO [train.py:917] (3/4) Epoch 22, validation: loss=0.03204, audio_tagging_loss=0.03204, over 3737520.00 frames. 2023-12-22 16:04:55,981 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 16:05:16,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667373.3333333334, ans=0.125 2023-12-22 16:05:16,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=667373.3333333334, ans=0.0 2023-12-22 16:05:28,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=667440.0, ans=0.125 2023-12-22 16:05:40,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-12-22 16:05:47,414 INFO [train.py:886] (3/4) Epoch 22, batch 50, loss[loss=0.01675, audio_tagging_loss=0.01675, over 24003.00 frames. ], tot_loss[loss=0.02132, audio_tagging_loss=0.02132, over 1116545.37 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:05:51,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=667573.3333333334, ans=0.0 2023-12-22 16:05:58,091 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.665e+01 3.143e+01 3.722e+01 4.421e+01 9.512e+01, threshold=7.444e+01, percent-clipped=8.0 2023-12-22 16:06:20,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=667773.3333333334, ans=0.0 2023-12-22 16:06:37,930 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:06:38,672 INFO [train.py:886] (3/4) Epoch 22, batch 100, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01821, audio_tagging_loss=0.01821, over 1967712.77 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2023-12-22 16:06:48,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667973.3333333334, ans=0.1 2023-12-22 16:06:49,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=667973.3333333334, ans=0.0 2023-12-22 16:06:51,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=667973.3333333334, ans=0.07 2023-12-22 16:06:59,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668040.0, ans=0.1 2023-12-22 16:07:05,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=668040.0, ans=0.0 2023-12-22 16:07:30,448 INFO [train.py:886] (3/4) Epoch 22, batch 150, loss[loss=0.01627, audio_tagging_loss=0.01627, over 25000.00 frames. ], tot_loss[loss=0.0167, audio_tagging_loss=0.0167, over 2634465.02 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:07:36,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2023-12-22 16:07:41,255 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.747e+01 3.091e+01 3.297e+01 3.433e+01 3.866e+01, threshold=6.595e+01, percent-clipped=0.0 2023-12-22 16:07:57,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=668373.3333333334, ans=0.0 2023-12-22 16:08:08,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=668440.0, ans=0.0 2023-12-22 16:08:22,588 INFO [train.py:886] (3/4) Epoch 22, batch 200, loss[loss=0.01245, audio_tagging_loss=0.01245, over 25000.00 frames. ], tot_loss[loss=0.01558, audio_tagging_loss=0.01558, over 3145527.90 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:08:37,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668640.0, ans=0.125 2023-12-22 16:08:43,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668706.6666666666, ans=0.1 2023-12-22 16:08:43,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=668706.6666666666, ans=0.125 2023-12-22 16:08:47,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-22 16:08:51,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668706.6666666666, ans=0.125 2023-12-22 16:08:52,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=668773.3333333334, ans=0.2 2023-12-22 16:09:03,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=668840.0, ans=0.125 2023-12-22 16:09:09,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=668840.0, ans=0.0 2023-12-22 16:09:13,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=668906.6666666666, ans=0.125 2023-12-22 16:09:14,298 INFO [train.py:886] (3/4) Epoch 22, batch 250, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01497, audio_tagging_loss=0.01497, over 3553304.94 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:09:24,427 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.578e+01 2.955e+01 3.079e+01 3.215e+01 4.174e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 16:09:31,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=668973.3333333334, ans=0.0 2023-12-22 16:09:42,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=669040.0, ans=0.09899494936611666 2023-12-22 16:09:58,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=669173.3333333334, ans=0.125 2023-12-22 16:10:06,724 INFO [train.py:886] (3/4) Epoch 22, batch 300, loss[loss=0.01516, audio_tagging_loss=0.01516, over 24750.00 frames. ], tot_loss[loss=0.01454, audio_tagging_loss=0.01454, over 3859679.03 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:10:18,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=669306.6666666666, ans=0.2 2023-12-22 16:10:30,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-12-22 16:10:52,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=669506.6666666666, ans=0.2 2023-12-22 16:10:55,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=669506.6666666666, ans=0.2 2023-12-22 16:10:58,102 INFO [train.py:886] (3/4) Epoch 22, batch 350, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01428, audio_tagging_loss=0.01428, over 4096324.01 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:11:05,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-12-22 16:11:08,936 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+01 2.952e+01 3.101e+01 3.215e+01 3.819e+01, threshold=6.201e+01, percent-clipped=0.0 2023-12-22 16:11:10,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=669640.0, ans=0.125 2023-12-22 16:11:18,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2023-12-22 16:11:31,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669773.3333333334, ans=0.1 2023-12-22 16:11:40,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=669840.0, ans=0.0 2023-12-22 16:11:47,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2023-12-22 16:11:50,141 INFO [train.py:886] (3/4) Epoch 22, batch 400, loss[loss=0.01204, audio_tagging_loss=0.01204, over 22740.00 frames. ], tot_loss[loss=0.01411, audio_tagging_loss=0.01411, over 4280953.18 frames. ], batch size: 107, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:12:04,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=669973.3333333334, ans=0.125 2023-12-22 16:12:13,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=670040.0, ans=0.09899494936611666 2023-12-22 16:12:16,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=670040.0, ans=0.125 2023-12-22 16:12:18,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=670040.0, ans=0.125 2023-12-22 16:12:27,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=12.0 2023-12-22 16:12:33,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=670173.3333333334, ans=0.125 2023-12-22 16:12:40,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=670240.0, ans=0.125 2023-12-22 16:12:42,521 INFO [train.py:886] (3/4) Epoch 22, batch 450, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01387, audio_tagging_loss=0.01387, over 4435404.01 frames. ], batch size: 99, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:12:43,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=8.0 2023-12-22 16:12:52,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.929e+01 3.055e+01 3.182e+01 3.732e+01, threshold=6.110e+01, percent-clipped=0.0 2023-12-22 16:12:58,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=670306.6666666666, ans=0.5 2023-12-22 16:12:59,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670306.6666666666, ans=0.0 2023-12-22 16:13:03,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-12-22 16:13:05,280 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.154e-01 2023-12-22 16:13:12,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=670440.0, ans=0.2 2023-12-22 16:13:30,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=670506.6666666666, ans=0.0 2023-12-22 16:13:31,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=670506.6666666666, ans=0.125 2023-12-22 16:13:33,358 INFO [train.py:886] (3/4) Epoch 22, batch 500, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 4551141.03 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 32.0 2023-12-22 16:13:48,178 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:13:50,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.73 vs. limit=10.0 2023-12-22 16:14:01,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=670706.6666666666, ans=0.0 2023-12-22 16:14:06,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2023-12-22 16:14:07,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670773.3333333334, ans=0.0 2023-12-22 16:14:12,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=670773.3333333334, ans=0.125 2023-12-22 16:14:18,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=670840.0, ans=0.035 2023-12-22 16:14:25,945 INFO [train.py:886] (3/4) Epoch 22, batch 550, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01373, audio_tagging_loss=0.01373, over 4642644.49 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:14:36,067 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+01 2.918e+01 3.052e+01 3.206e+01 3.698e+01, threshold=6.105e+01, percent-clipped=0.0 2023-12-22 16:14:36,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=670973.3333333334, ans=0.0 2023-12-22 16:14:43,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=670973.3333333334, ans=0.125 2023-12-22 16:14:45,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=671040.0, ans=0.2 2023-12-22 16:15:16,947 INFO [train.py:886] (3/4) Epoch 22, batch 600, loss[loss=0.01676, audio_tagging_loss=0.01676, over 24750.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 4711183.40 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:15:17,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=671240.0, ans=0.0 2023-12-22 16:16:06,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=671506.6666666666, ans=0.0 2023-12-22 16:16:09,861 INFO [train.py:886] (3/4) Epoch 22, batch 650, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4755144.48 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:16:10,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=671573.3333333334, ans=0.05 2023-12-22 16:16:13,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=671573.3333333334, ans=0.0 2023-12-22 16:16:15,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=671573.3333333334, ans=0.125 2023-12-22 16:16:19,417 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 2.962e+01 3.088e+01 3.252e+01 3.665e+01, threshold=6.175e+01, percent-clipped=0.0 2023-12-22 16:16:21,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=671640.0, ans=0.125 2023-12-22 16:16:42,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=671773.3333333334, ans=0.0 2023-12-22 16:16:46,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-22 16:17:01,204 INFO [train.py:886] (3/4) Epoch 22, batch 700, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01382, audio_tagging_loss=0.01382, over 4789689.07 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:07,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2023-12-22 16:17:23,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=672040.0, ans=10.0 2023-12-22 16:17:30,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2023-12-22 16:17:39,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=672106.6666666666, ans=0.125 2023-12-22 16:17:44,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2023-12-22 16:17:49,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672173.3333333334, ans=0.1 2023-12-22 16:17:52,035 INFO [train.py:886] (3/4) Epoch 22, batch 750, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 4827911.73 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:17:57,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=672240.0, ans=0.0 2023-12-22 16:18:02,882 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.002e+01 3.128e+01 3.298e+01 3.708e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 16:18:11,173 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:18:25,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=672440.0, ans=0.0 2023-12-22 16:18:26,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=672440.0, ans=0.125 2023-12-22 16:18:36,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=672506.6666666666, ans=0.07 2023-12-22 16:18:39,052 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:18:45,118 INFO [train.py:886] (3/4) Epoch 22, batch 800, loss[loss=0.01532, audio_tagging_loss=0.01532, over 24043.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4856959.65 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:18:59,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=672640.0, ans=0.04949747468305833 2023-12-22 16:19:07,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=672706.6666666666, ans=0.0 2023-12-22 16:19:08,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=672706.6666666666, ans=0.125 2023-12-22 16:19:09,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=672706.6666666666, ans=0.09899494936611666 2023-12-22 16:19:22,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=672773.3333333334, ans=0.2 2023-12-22 16:19:27,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672840.0, ans=0.1 2023-12-22 16:19:28,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=12.0 2023-12-22 16:19:32,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=672840.0, ans=0.0 2023-12-22 16:19:36,119 INFO [train.py:886] (3/4) Epoch 22, batch 850, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4882274.67 frames. ], batch size: 100, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:19:47,076 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.653e+01 2.942e+01 3.056e+01 3.166e+01 3.620e+01, threshold=6.111e+01, percent-clipped=0.0 2023-12-22 16:19:55,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-12-22 16:20:00,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=673040.0, ans=0.2 2023-12-22 16:20:13,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=673106.6666666666, ans=15.0 2023-12-22 16:20:27,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=22.5 2023-12-22 16:20:28,690 INFO [train.py:886] (3/4) Epoch 22, batch 900, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4900483.29 frames. ], batch size: 99, lr: 4.97e-03, grad_scale: 32.0 2023-12-22 16:20:41,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673306.6666666666, ans=0.1 2023-12-22 16:20:47,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=673306.6666666666, ans=0.125 2023-12-22 16:20:48,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673373.3333333334, ans=0.1 2023-12-22 16:20:54,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=673373.3333333334, ans=0.015 2023-12-22 16:21:20,358 INFO [train.py:886] (3/4) Epoch 22, batch 950, loss[loss=0.01192, audio_tagging_loss=0.01192, over 22395.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4908431.52 frames. ], batch size: 107, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:21:27,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=673573.3333333334, ans=0.0 2023-12-22 16:21:30,887 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.985e+01 3.099e+01 3.290e+01 3.638e+01, threshold=6.198e+01, percent-clipped=0.0 2023-12-22 16:21:33,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=673640.0, ans=0.0 2023-12-22 16:21:36,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-12-22 16:21:40,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=673706.6666666666, ans=0.125 2023-12-22 16:21:50,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=673773.3333333334, ans=0.09899494936611666 2023-12-22 16:22:07,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2023-12-22 16:22:08,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673840.0, ans=0.1 2023-12-22 16:22:11,947 INFO [train.py:886] (3/4) Epoch 22, batch 1000, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4915304.38 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:22:23,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=673973.3333333334, ans=0.125 2023-12-22 16:22:33,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674040.0, ans=0.125 2023-12-22 16:22:37,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674040.0, ans=0.1 2023-12-22 16:22:42,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=674106.6666666666, ans=0.125 2023-12-22 16:22:53,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=674173.3333333334, ans=0.125 2023-12-22 16:22:54,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=674173.3333333334, ans=0.0 2023-12-22 16:23:02,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=674173.3333333334, ans=0.125 2023-12-22 16:23:05,199 INFO [train.py:886] (3/4) Epoch 22, batch 1050, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4922692.87 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:23:09,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=674240.0, ans=0.0 2023-12-22 16:23:14,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 2.919e+01 3.070e+01 3.239e+01 4.205e+01, threshold=6.141e+01, percent-clipped=0.0 2023-12-22 16:23:42,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=674440.0, ans=0.0 2023-12-22 16:23:54,433 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:23:56,078 INFO [train.py:886] (3/4) Epoch 22, batch 1100, loss[loss=0.01054, audio_tagging_loss=0.01054, over 24059.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4924663.74 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:00,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2023-12-22 16:24:08,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=674640.0, ans=0.0 2023-12-22 16:24:08,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=674640.0, ans=0.0 2023-12-22 16:24:12,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=674640.0, ans=0.125 2023-12-22 16:24:13,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=674640.0, ans=0.2 2023-12-22 16:24:30,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=674773.3333333334, ans=0.125 2023-12-22 16:24:36,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-22 16:24:49,289 INFO [train.py:886] (3/4) Epoch 22, batch 1150, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4928185.24 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:24:59,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.584e+01 2.884e+01 2.992e+01 3.161e+01 3.623e+01, threshold=5.985e+01, percent-clipped=0.0 2023-12-22 16:25:05,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674973.3333333334, ans=0.1 2023-12-22 16:25:22,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675106.6666666666, ans=0.1 2023-12-22 16:25:40,901 INFO [train.py:886] (3/4) Epoch 22, batch 1200, loss[loss=0.01469, audio_tagging_loss=0.01469, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4931438.32 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:25:42,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=675240.0, ans=0.0 2023-12-22 16:25:55,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=675306.6666666666, ans=0.0 2023-12-22 16:25:59,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675306.6666666666, ans=0.1 2023-12-22 16:26:04,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675373.3333333334, ans=0.1 2023-12-22 16:26:30,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=675506.6666666666, ans=0.0 2023-12-22 16:26:32,294 INFO [train.py:886] (3/4) Epoch 22, batch 1250, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4931144.76 frames. ], batch size: 99, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:26:40,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675573.3333333334, ans=0.1 2023-12-22 16:26:42,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=675640.0, ans=0.125 2023-12-22 16:26:43,010 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.009e+01 3.140e+01 3.242e+01 3.734e+01, threshold=6.280e+01, percent-clipped=0.0 2023-12-22 16:26:46,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=675640.0, ans=0.0 2023-12-22 16:27:05,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=675773.3333333334, ans=0.125 2023-12-22 16:27:24,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=675906.6666666666, ans=0.125 2023-12-22 16:27:24,923 INFO [train.py:886] (3/4) Epoch 22, batch 1300, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24011.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4934342.26 frames. ], batch size: 100, lr: 4.96e-03, grad_scale: 32.0 2023-12-22 16:27:29,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2023-12-22 16:27:30,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2023-12-22 16:27:46,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=676040.0, ans=0.0 2023-12-22 16:27:48,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=676040.0, ans=0.1 2023-12-22 16:27:48,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=676040.0, ans=0.125 2023-12-22 16:28:15,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=676173.3333333334, ans=0.125 2023-12-22 16:28:17,151 INFO [train.py:886] (3/4) Epoch 22, batch 1350, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4932771.39 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:28:19,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=676240.0, ans=0.125 2023-12-22 16:28:19,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=676240.0, ans=0.2 2023-12-22 16:28:27,394 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 2.919e+01 3.091e+01 3.263e+01 3.767e+01, threshold=6.183e+01, percent-clipped=0.0 2023-12-22 16:28:30,509 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:28:35,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-22 16:29:02,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=676506.6666666666, ans=0.125 2023-12-22 16:29:08,760 INFO [train.py:886] (3/4) Epoch 22, batch 1400, loss[loss=0.01486, audio_tagging_loss=0.01486, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4940561.33 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:29:30,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-12-22 16:29:34,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2023-12-22 16:29:40,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-22 16:30:00,718 INFO [train.py:886] (3/4) Epoch 22, batch 1450, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4949729.24 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:30:10,224 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+01 2.926e+01 3.092e+01 3.201e+01 4.336e+01, threshold=6.185e+01, percent-clipped=0.0 2023-12-22 16:30:15,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-22 16:30:24,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=677040.0, ans=0.0 2023-12-22 16:30:37,920 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.285e-01 2023-12-22 16:30:43,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=677173.3333333334, ans=0.125 2023-12-22 16:30:43,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-12-22 16:30:51,771 INFO [train.py:886] (3/4) Epoch 22, batch 1500, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4949496.89 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:31:07,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=677306.6666666666, ans=0.1 2023-12-22 16:31:13,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=677373.3333333334, ans=0.02 2023-12-22 16:31:13,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-12-22 16:31:23,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2023-12-22 16:31:28,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677440.0, ans=0.1 2023-12-22 16:31:37,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=677506.6666666666, ans=0.125 2023-12-22 16:31:44,046 INFO [train.py:886] (3/4) Epoch 22, batch 1550, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4953858.15 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:31:45,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-12-22 16:31:46,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677573.3333333334, ans=0.125 2023-12-22 16:31:54,081 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.155e+01 3.308e+01 3.901e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 16:32:10,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=677706.6666666666, ans=15.0 2023-12-22 16:32:16,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677773.3333333334, ans=0.1 2023-12-22 16:32:22,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=677773.3333333334, ans=0.0 2023-12-22 16:32:35,992 INFO [train.py:886] (3/4) Epoch 22, batch 1600, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 4950632.80 frames. ], batch size: 99, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:32:37,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677906.6666666666, ans=0.0 2023-12-22 16:32:53,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=677973.3333333334, ans=0.125 2023-12-22 16:33:04,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678040.0, ans=0.1 2023-12-22 16:33:06,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=678106.6666666666, ans=0.07 2023-12-22 16:33:08,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-12-22 16:33:11,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678106.6666666666, ans=0.0 2023-12-22 16:33:22,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.585e-01 2023-12-22 16:33:23,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678173.3333333334, ans=0.0 2023-12-22 16:33:25,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678240.0, ans=0.125 2023-12-22 16:33:26,343 INFO [train.py:886] (3/4) Epoch 22, batch 1650, loss[loss=0.01527, audio_tagging_loss=0.01527, over 21785.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4950834.05 frames. ], batch size: 107, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:33:29,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-12-22 16:33:37,920 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 2.963e+01 3.106e+01 3.211e+01 3.845e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 16:33:50,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=678373.3333333334, ans=0.0 2023-12-22 16:33:57,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-22 16:34:07,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=678440.0, ans=0.125 2023-12-22 16:34:14,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2023-12-22 16:34:18,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=678573.3333333334, ans=0.95 2023-12-22 16:34:19,398 INFO [train.py:886] (3/4) Epoch 22, batch 1700, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4951479.84 frames. ], batch size: 100, lr: 4.95e-03, grad_scale: 32.0 2023-12-22 16:34:19,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=678573.3333333334, ans=0.0 2023-12-22 16:34:36,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=678640.0, ans=0.09899494936611666 2023-12-22 16:34:42,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2023-12-22 16:35:04,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678840.0, ans=0.125 2023-12-22 16:35:05,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=678840.0, ans=0.125 2023-12-22 16:35:11,972 INFO [train.py:886] (3/4) Epoch 22, batch 1750, loss[loss=0.01314, audio_tagging_loss=0.01314, over 21798.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4948318.04 frames. ], batch size: 107, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:35:22,226 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.914e+01 2.997e+01 3.169e+01 3.655e+01, threshold=5.994e+01, percent-clipped=0.0 2023-12-22 16:35:33,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=679040.0, ans=0.125 2023-12-22 16:35:42,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=679106.6666666666, ans=0.125 2023-12-22 16:35:44,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 16:35:57,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=679173.3333333334, ans=0.125 2023-12-22 16:35:58,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=679173.3333333334, ans=0.125 2023-12-22 16:35:58,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=12.0 2023-12-22 16:36:03,018 INFO [train.py:886] (3/4) Epoch 22, batch 1800, loss[loss=0.01309, audio_tagging_loss=0.01309, over 22783.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4950750.99 frames. ], batch size: 107, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:36:04,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=679240.0, ans=0.09899494936611666 2023-12-22 16:36:18,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=679306.6666666666, ans=0.125 2023-12-22 16:36:55,375 INFO [train.py:886] (3/4) Epoch 22, batch 1850, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4952013.20 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:37:05,644 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 2.979e+01 3.098e+01 3.249e+01 3.883e+01, threshold=6.197e+01, percent-clipped=0.0 2023-12-22 16:37:25,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=679773.3333333334, ans=0.2 2023-12-22 16:37:26,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=679773.3333333334, ans=0.5 2023-12-22 16:37:33,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=679773.3333333334, ans=0.0 2023-12-22 16:37:45,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=679906.6666666666, ans=0.0 2023-12-22 16:37:46,025 INFO [train.py:886] (3/4) Epoch 22, batch 1900, loss[loss=0.01508, audio_tagging_loss=0.01508, over 24750.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4941731.91 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:37:48,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=679906.6666666666, ans=0.2 2023-12-22 16:37:51,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=679906.6666666666, ans=0.5 2023-12-22 16:37:55,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.62 vs. limit=22.5 2023-12-22 16:38:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679973.3333333334, ans=0.1 2023-12-22 16:38:05,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=679973.3333333334, ans=0.125 2023-12-22 16:38:15,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2023-12-22 16:38:19,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=680106.6666666666, ans=0.04949747468305833 2023-12-22 16:38:39,060 INFO [train.py:886] (3/4) Epoch 22, batch 1950, loss[loss=0.009727, audio_tagging_loss=0.009727, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4939561.88 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:38:48,506 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.054e+01 3.166e+01 3.335e+01 3.897e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 16:38:52,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=680306.6666666666, ans=0.04949747468305833 2023-12-22 16:39:07,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=12.0 2023-12-22 16:39:12,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=680440.0, ans=0.125 2023-12-22 16:39:26,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=680506.6666666666, ans=0.125 2023-12-22 16:39:30,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2023-12-22 16:39:30,778 INFO [train.py:886] (3/4) Epoch 22, batch 2000, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4943840.92 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 32.0 2023-12-22 16:39:32,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-12-22 16:39:50,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=680706.6666666666, ans=0.125 2023-12-22 16:40:01,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=680773.3333333334, ans=0.0 2023-12-22 16:40:02,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680773.3333333334, ans=0.1 2023-12-22 16:40:04,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=680773.3333333334, ans=0.125 2023-12-22 16:40:17,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=680840.0, ans=0.125 2023-12-22 16:40:21,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-22 16:40:22,158 INFO [train.py:886] (3/4) Epoch 22, batch 2050, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4947729.96 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:40:25,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680906.6666666666, ans=0.1 2023-12-22 16:40:29,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=680906.6666666666, ans=0.125 2023-12-22 16:40:33,015 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.587e+01 2.841e+01 3.013e+01 3.146e+01 3.558e+01, threshold=6.025e+01, percent-clipped=0.0 2023-12-22 16:40:43,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=681040.0, ans=0.035 2023-12-22 16:40:44,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=681040.0, ans=0.125 2023-12-22 16:40:46,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=681040.0, ans=0.125 2023-12-22 16:41:04,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=681173.3333333334, ans=0.125 2023-12-22 16:41:07,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=681173.3333333334, ans=0.0 2023-12-22 16:41:13,762 INFO [train.py:886] (3/4) Epoch 22, batch 2100, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4948664.65 frames. ], batch size: 99, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:41:17,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=681240.0, ans=0.0 2023-12-22 16:41:21,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=681240.0, ans=0.125 2023-12-22 16:41:37,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=681373.3333333334, ans=0.0 2023-12-22 16:41:41,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=681373.3333333334, ans=0.95 2023-12-22 16:41:50,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-12-22 16:42:02,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=681506.6666666666, ans=0.0 2023-12-22 16:42:05,479 INFO [train.py:886] (3/4) Epoch 22, batch 2150, loss[loss=0.01559, audio_tagging_loss=0.01559, over 24945.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4954521.71 frames. ], batch size: 100, lr: 4.94e-03, grad_scale: 64.0 2023-12-22 16:42:15,660 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.015e+01 3.093e+01 3.215e+01 3.763e+01, threshold=6.186e+01, percent-clipped=0.0 2023-12-22 16:42:37,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=681773.3333333334, ans=0.125 2023-12-22 16:42:41,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=681773.3333333334, ans=0.5 2023-12-22 16:42:54,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=681840.0, ans=0.07 2023-12-22 16:42:57,820 INFO [train.py:886] (3/4) Epoch 22, batch 2200, loss[loss=0.01679, audio_tagging_loss=0.01679, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4950861.93 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:43:07,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=681973.3333333334, ans=0.125 2023-12-22 16:43:09,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681973.3333333334, ans=0.1 2023-12-22 16:43:11,934 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:43:17,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=682040.0, ans=0.125 2023-12-22 16:43:22,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=682040.0, ans=0.125 2023-12-22 16:43:34,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=682106.6666666666, ans=0.2 2023-12-22 16:43:45,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=682173.3333333334, ans=0.125 2023-12-22 16:43:49,684 INFO [train.py:886] (3/4) Epoch 22, batch 2250, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4946715.46 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:44:00,831 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 2.973e+01 3.104e+01 3.289e+01 3.674e+01, threshold=6.209e+01, percent-clipped=0.0 2023-12-22 16:44:04,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=682306.6666666666, ans=0.2 2023-12-22 16:44:05,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=682306.6666666666, ans=0.015 2023-12-22 16:44:10,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682373.3333333334, ans=0.1 2023-12-22 16:44:17,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=682373.3333333334, ans=0.2 2023-12-22 16:44:27,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=682440.0, ans=0.2 2023-12-22 16:44:42,454 INFO [train.py:886] (3/4) Epoch 22, batch 2300, loss[loss=0.01238, audio_tagging_loss=0.01238, over 22579.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4938410.78 frames. ], batch size: 107, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:44:51,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-12-22 16:45:01,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-12-22 16:45:07,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-12-22 16:45:11,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=682706.6666666666, ans=0.125 2023-12-22 16:45:11,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682706.6666666666, ans=0.125 2023-12-22 16:45:15,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-12-22 16:45:15,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-12-22 16:45:22,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=682773.3333333334, ans=6.0 2023-12-22 16:45:22,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682773.3333333334, ans=0.1 2023-12-22 16:45:34,702 INFO [train.py:886] (3/4) Epoch 22, batch 2350, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4944542.37 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:45:45,017 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+01 2.951e+01 3.052e+01 3.214e+01 3.845e+01, threshold=6.104e+01, percent-clipped=0.0 2023-12-22 16:45:56,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-12-22 16:46:08,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-12-22 16:46:16,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-22 16:46:24,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2023-12-22 16:46:26,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683240.0, ans=0.1 2023-12-22 16:46:27,129 INFO [train.py:886] (3/4) Epoch 22, batch 2400, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4949212.57 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:46:28,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=683240.0, ans=0.0 2023-12-22 16:46:38,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683306.6666666666, ans=0.125 2023-12-22 16:46:39,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683306.6666666666, ans=0.125 2023-12-22 16:46:42,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683306.6666666666, ans=0.125 2023-12-22 16:46:43,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=683306.6666666666, ans=0.0 2023-12-22 16:46:48,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2023-12-22 16:46:57,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=15.0 2023-12-22 16:47:03,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683440.0, ans=0.1 2023-12-22 16:47:18,260 INFO [train.py:886] (3/4) Epoch 22, batch 2450, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4957417.43 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:47:28,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.64 vs. limit=22.5 2023-12-22 16:47:29,198 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.542e+01 2.984e+01 3.077e+01 3.217e+01 3.781e+01, threshold=6.155e+01, percent-clipped=0.0 2023-12-22 16:47:35,712 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:47:46,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683706.6666666666, ans=0.125 2023-12-22 16:48:10,591 INFO [train.py:886] (3/4) Epoch 22, batch 2500, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24750.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4952805.44 frames. ], batch size: 99, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:48:12,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683906.6666666666, ans=0.125 2023-12-22 16:48:25,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=683973.3333333334, ans=0.125 2023-12-22 16:48:33,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=684040.0, ans=0.125 2023-12-22 16:48:36,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=684040.0, ans=6.0 2023-12-22 16:48:44,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=684106.6666666666, ans=0.125 2023-12-22 16:48:49,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=684106.6666666666, ans=0.0 2023-12-22 16:49:03,071 INFO [train.py:886] (3/4) Epoch 22, batch 2550, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4945874.99 frames. ], batch size: 100, lr: 4.93e-03, grad_scale: 64.0 2023-12-22 16:49:05,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=12.0 2023-12-22 16:49:13,124 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+01 2.972e+01 3.101e+01 3.260e+01 3.822e+01, threshold=6.203e+01, percent-clipped=0.0 2023-12-22 16:49:31,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=684373.3333333334, ans=0.05 2023-12-22 16:49:36,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=684440.0, ans=0.125 2023-12-22 16:49:38,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=12.0 2023-12-22 16:49:41,822 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 16:49:54,511 INFO [train.py:886] (3/4) Epoch 22, batch 2600, loss[loss=0.009912, audio_tagging_loss=0.009912, over 24750.00 frames. ], tot_loss[loss=0.01356, audio_tagging_loss=0.01356, over 4944696.38 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:00,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-12-22 16:50:17,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684706.6666666666, ans=0.0 2023-12-22 16:50:19,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=684706.6666666666, ans=0.0 2023-12-22 16:50:22,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=684706.6666666666, ans=0.0 2023-12-22 16:50:27,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=684773.3333333334, ans=0.0 2023-12-22 16:50:42,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=684840.0, ans=0.2 2023-12-22 16:50:46,543 INFO [train.py:886] (3/4) Epoch 22, batch 2650, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4943036.18 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:50:52,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684906.6666666666, ans=0.1 2023-12-22 16:50:55,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684906.6666666666, ans=0.1 2023-12-22 16:50:56,719 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+01 2.949e+01 3.109e+01 3.258e+01 4.396e+01, threshold=6.219e+01, percent-clipped=0.0 2023-12-22 16:51:18,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=685106.6666666666, ans=0.125 2023-12-22 16:51:19,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-12-22 16:51:22,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685106.6666666666, ans=0.125 2023-12-22 16:51:26,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=685106.6666666666, ans=0.125 2023-12-22 16:51:27,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=685173.3333333334, ans=0.2 2023-12-22 16:51:27,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685173.3333333334, ans=0.125 2023-12-22 16:51:34,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-12-22 16:51:38,257 INFO [train.py:886] (3/4) Epoch 22, batch 2700, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4949949.10 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:51:40,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=685240.0, ans=0.125 2023-12-22 16:51:45,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=685240.0, ans=0.125 2023-12-22 16:51:46,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=685240.0, ans=0.0 2023-12-22 16:52:17,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=685440.0, ans=0.0 2023-12-22 16:52:17,486 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.225e-02 2023-12-22 16:52:24,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=685506.6666666666, ans=0.0 2023-12-22 16:52:29,510 INFO [train.py:886] (3/4) Epoch 22, batch 2750, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4949081.23 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:52:37,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=685573.3333333334, ans=0.125 2023-12-22 16:52:39,645 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.604e+01 2.941e+01 3.076e+01 3.293e+01 3.896e+01, threshold=6.152e+01, percent-clipped=0.0 2023-12-22 16:52:46,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=685640.0, ans=0.015 2023-12-22 16:53:17,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=685840.0, ans=0.125 2023-12-22 16:53:22,636 INFO [train.py:886] (3/4) Epoch 22, batch 2800, loss[loss=0.01618, audio_tagging_loss=0.01618, over 24750.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4946845.03 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:53:34,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685973.3333333334, ans=0.0 2023-12-22 16:53:44,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686040.0, ans=0.1 2023-12-22 16:53:52,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-12-22 16:53:53,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=686106.6666666666, ans=0.5 2023-12-22 16:53:58,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=686106.6666666666, ans=0.0 2023-12-22 16:54:06,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=686173.3333333334, ans=0.125 2023-12-22 16:54:13,866 INFO [train.py:886] (3/4) Epoch 22, batch 2850, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01351, audio_tagging_loss=0.01351, over 4945153.89 frames. ], batch size: 99, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:54:22,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-12-22 16:54:24,834 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.963e+01 3.132e+01 3.265e+01 3.712e+01, threshold=6.264e+01, percent-clipped=0.0 2023-12-22 16:54:25,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=686306.6666666666, ans=0.125 2023-12-22 16:54:30,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=686306.6666666666, ans=0.125 2023-12-22 16:54:44,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=686440.0, ans=0.0 2023-12-22 16:55:06,041 INFO [train.py:886] (3/4) Epoch 22, batch 2900, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4940294.42 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:55:21,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=686640.0, ans=0.2 2023-12-22 16:55:23,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=686640.0, ans=0.125 2023-12-22 16:55:37,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=686773.3333333334, ans=0.125 2023-12-22 16:55:41,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=686773.3333333334, ans=0.125 2023-12-22 16:55:53,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686840.0, ans=0.1 2023-12-22 16:55:56,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=686840.0, ans=0.125 2023-12-22 16:55:56,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=686840.0, ans=0.0 2023-12-22 16:55:58,679 INFO [train.py:886] (3/4) Epoch 22, batch 2950, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4944637.54 frames. ], batch size: 100, lr: 4.92e-03, grad_scale: 64.0 2023-12-22 16:55:58,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=686906.6666666666, ans=0.125 2023-12-22 16:56:08,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+01 2.913e+01 3.029e+01 3.205e+01 3.789e+01, threshold=6.058e+01, percent-clipped=0.0 2023-12-22 16:56:12,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686973.3333333334, ans=0.125 2023-12-22 16:56:15,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=686973.3333333334, ans=0.2 2023-12-22 16:56:35,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=687106.6666666666, ans=0.125 2023-12-22 16:56:47,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-22 16:56:50,266 INFO [train.py:886] (3/4) Epoch 22, batch 3000, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4951699.02 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:56:50,266 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 16:57:11,829 INFO [train.py:917] (3/4) Epoch 22, validation: loss=0.03274, audio_tagging_loss=0.03274, over 3737520.00 frames. 2023-12-22 16:57:11,830 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 16:57:12,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.56 vs. limit=5.0 2023-12-22 16:57:24,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=687306.6666666666, ans=0.2 2023-12-22 16:57:25,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=687306.6666666666, ans=0.2 2023-12-22 16:57:48,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=687440.0, ans=0.125 2023-12-22 16:57:50,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=687440.0, ans=0.125 2023-12-22 16:57:59,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=687506.6666666666, ans=0.125 2023-12-22 16:58:01,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=687506.6666666666, ans=0.0 2023-12-22 16:58:03,727 INFO [train.py:886] (3/4) Epoch 22, batch 3050, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4959722.78 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:58:10,334 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.510e-03 2023-12-22 16:58:13,864 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.728e+01 2.983e+01 3.097e+01 3.226e+01 3.702e+01, threshold=6.194e+01, percent-clipped=0.0 2023-12-22 16:58:28,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-12-22 16:58:39,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=687773.3333333334, ans=0.5 2023-12-22 16:58:47,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=687840.0, ans=0.125 2023-12-22 16:58:49,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687840.0, ans=0.125 2023-12-22 16:58:56,221 INFO [train.py:886] (3/4) Epoch 22, batch 3100, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4957390.31 frames. ], batch size: 99, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:59:09,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=687973.3333333334, ans=0.07 2023-12-22 16:59:28,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=688106.6666666666, ans=0.125 2023-12-22 16:59:32,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=688106.6666666666, ans=0.0 2023-12-22 16:59:41,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-22 16:59:48,291 INFO [train.py:886] (3/4) Epoch 22, batch 3150, loss[loss=0.0167, audio_tagging_loss=0.0167, over 24948.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4951986.94 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 16:59:49,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=688240.0, ans=0.125 2023-12-22 16:59:55,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=688240.0, ans=0.0 2023-12-22 16:59:59,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+01 2.993e+01 3.103e+01 3.261e+01 3.891e+01, threshold=6.205e+01, percent-clipped=0.0 2023-12-22 17:00:04,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=688306.6666666666, ans=0.125 2023-12-22 17:00:06,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=688306.6666666666, ans=0.0 2023-12-22 17:00:06,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=688306.6666666666, ans=0.125 2023-12-22 17:00:09,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=688373.3333333334, ans=0.125 2023-12-22 17:00:15,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=688373.3333333334, ans=0.125 2023-12-22 17:00:24,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=688440.0, ans=0.125 2023-12-22 17:00:40,700 INFO [train.py:886] (3/4) Epoch 22, batch 3200, loss[loss=0.01319, audio_tagging_loss=0.01319, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4956097.71 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:01:14,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-12-22 17:01:22,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688840.0, ans=0.1 2023-12-22 17:01:27,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=688840.0, ans=0.2 2023-12-22 17:01:31,996 INFO [train.py:886] (3/4) Epoch 22, batch 3250, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4950078.19 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:01:41,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-12-22 17:01:42,249 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 2.942e+01 3.078e+01 3.201e+01 3.535e+01, threshold=6.156e+01, percent-clipped=0.0 2023-12-22 17:01:52,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=689040.0, ans=0.125 2023-12-22 17:01:54,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689040.0, ans=0.125 2023-12-22 17:01:55,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689040.0, ans=0.125 2023-12-22 17:01:56,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=689040.0, ans=0.125 2023-12-22 17:02:11,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2023-12-22 17:02:21,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 17:02:24,435 INFO [train.py:886] (3/4) Epoch 22, batch 3300, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4948931.52 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:02:25,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=689240.0, ans=0.02 2023-12-22 17:02:25,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=689240.0, ans=0.0 2023-12-22 17:02:29,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=689240.0, ans=0.0 2023-12-22 17:02:41,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2023-12-22 17:02:45,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-12-22 17:02:58,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=689440.0, ans=0.035 2023-12-22 17:03:01,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=689440.0, ans=0.0 2023-12-22 17:03:10,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=689506.6666666666, ans=0.2 2023-12-22 17:03:12,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=689506.6666666666, ans=0.2 2023-12-22 17:03:13,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=689506.6666666666, ans=0.2 2023-12-22 17:03:16,378 INFO [train.py:886] (3/4) Epoch 22, batch 3350, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4956446.24 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:03:16,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=689573.3333333334, ans=0.0 2023-12-22 17:03:27,218 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+01 2.983e+01 3.128e+01 3.276e+01 3.724e+01, threshold=6.256e+01, percent-clipped=0.0 2023-12-22 17:03:37,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2023-12-22 17:03:40,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.93 vs. limit=22.5 2023-12-22 17:04:08,156 INFO [train.py:886] (3/4) Epoch 22, batch 3400, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4955443.78 frames. ], batch size: 100, lr: 4.91e-03, grad_scale: 64.0 2023-12-22 17:04:13,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=689906.6666666666, ans=0.09899494936611666 2023-12-22 17:04:14,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=15.0 2023-12-22 17:04:20,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=689973.3333333334, ans=0.125 2023-12-22 17:05:00,553 INFO [train.py:886] (3/4) Epoch 22, batch 3450, loss[loss=0.01614, audio_tagging_loss=0.01614, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4948506.59 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:05:08,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=690240.0, ans=0.0 2023-12-22 17:05:10,807 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+01 3.072e+01 3.167e+01 3.266e+01 3.818e+01, threshold=6.334e+01, percent-clipped=0.0 2023-12-22 17:05:25,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=690373.3333333334, ans=0.2 2023-12-22 17:05:42,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690506.6666666666, ans=0.0 2023-12-22 17:05:52,082 INFO [train.py:886] (3/4) Epoch 22, batch 3500, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4949260.52 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:05:58,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-12-22 17:06:06,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690640.0, ans=0.0 2023-12-22 17:06:25,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=12.0 2023-12-22 17:06:40,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690840.0, ans=0.1 2023-12-22 17:06:44,592 INFO [train.py:886] (3/4) Epoch 22, batch 3550, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 4945950.32 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:06:54,158 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 2.964e+01 3.144e+01 3.307e+01 3.937e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:07:07,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=691040.0, ans=0.0 2023-12-22 17:07:09,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-12-22 17:07:14,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=691040.0, ans=0.0 2023-12-22 17:07:35,782 INFO [train.py:886] (3/4) Epoch 22, batch 3600, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4949241.58 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:07:54,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.326e-02 2023-12-22 17:07:55,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=691306.6666666666, ans=0.0 2023-12-22 17:07:57,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691373.3333333334, ans=0.1 2023-12-22 17:08:04,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=691373.3333333334, ans=0.125 2023-12-22 17:08:04,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-22 17:08:13,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.00 vs. limit=15.0 2023-12-22 17:08:28,091 INFO [train.py:886] (3/4) Epoch 22, batch 3650, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4949827.89 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:08:38,306 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+01 2.894e+01 3.035e+01 3.158e+01 3.520e+01, threshold=6.071e+01, percent-clipped=0.0 2023-12-22 17:08:40,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=691640.0, ans=0.0 2023-12-22 17:08:45,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=691640.0, ans=0.0 2023-12-22 17:08:45,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=691640.0, ans=0.125 2023-12-22 17:08:46,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691640.0, ans=0.125 2023-12-22 17:08:54,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=691706.6666666666, ans=0.0 2023-12-22 17:08:54,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-22 17:08:58,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=691773.3333333334, ans=0.09899494936611666 2023-12-22 17:08:59,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=691773.3333333334, ans=0.0 2023-12-22 17:09:01,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=691773.3333333334, ans=0.0 2023-12-22 17:09:10,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-12-22 17:09:13,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.80 vs. limit=22.5 2023-12-22 17:09:13,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=691840.0, ans=0.0 2023-12-22 17:09:14,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.38 vs. limit=22.5 2023-12-22 17:09:14,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=691840.0, ans=0.0 2023-12-22 17:09:19,857 INFO [train.py:886] (3/4) Epoch 22, batch 3700, loss[loss=0.01551, audio_tagging_loss=0.01551, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4951228.78 frames. ], batch size: 100, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:09:22,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691906.6666666666, ans=0.1 2023-12-22 17:09:48,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=692040.0, ans=0.125 2023-12-22 17:09:53,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=692106.6666666666, ans=0.125 2023-12-22 17:09:57,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-12-22 17:10:02,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=692173.3333333334, ans=0.125 2023-12-22 17:10:02,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=692173.3333333334, ans=0.1 2023-12-22 17:10:03,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-22 17:10:04,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=692173.3333333334, ans=0.0 2023-12-22 17:10:12,268 INFO [train.py:886] (3/4) Epoch 22, batch 3750, loss[loss=0.01623, audio_tagging_loss=0.01623, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4945150.44 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:10:13,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=692240.0, ans=0.2 2023-12-22 17:10:22,354 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.031e+01 3.113e+01 3.271e+01 3.807e+01, threshold=6.227e+01, percent-clipped=0.0 2023-12-22 17:10:22,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=692306.6666666666, ans=0.125 2023-12-22 17:10:27,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=692306.6666666666, ans=0.2 2023-12-22 17:10:32,381 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:11:04,404 INFO [train.py:886] (3/4) Epoch 22, batch 3800, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4941573.75 frames. ], batch size: 99, lr: 4.90e-03, grad_scale: 64.0 2023-12-22 17:11:48,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=692840.0, ans=0.0 2023-12-22 17:11:51,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.93 vs. limit=22.5 2023-12-22 17:11:55,870 INFO [train.py:886] (3/4) Epoch 22, batch 3850, loss[loss=0.01498, audio_tagging_loss=0.01498, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4945677.36 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:12:06,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+01 3.023e+01 3.138e+01 3.280e+01 3.905e+01, threshold=6.276e+01, percent-clipped=0.0 2023-12-22 17:12:09,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=692973.3333333334, ans=0.2 2023-12-22 17:12:21,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2023-12-22 17:12:28,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=693106.6666666666, ans=0.125 2023-12-22 17:12:32,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=693106.6666666666, ans=0.125 2023-12-22 17:12:47,260 INFO [train.py:886] (3/4) Epoch 22, batch 3900, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4948008.00 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:12:53,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-12-22 17:12:57,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-12-22 17:13:27,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=693440.0, ans=0.2 2023-12-22 17:13:27,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=693440.0, ans=0.0 2023-12-22 17:13:30,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=693506.6666666666, ans=15.0 2023-12-22 17:13:39,931 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:13:41,586 INFO [train.py:886] (3/4) Epoch 22, batch 3950, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4950541.71 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:13:43,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693573.3333333334, ans=0.1 2023-12-22 17:13:50,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=693573.3333333334, ans=0.125 2023-12-22 17:13:51,729 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 2.965e+01 3.079e+01 3.254e+01 4.090e+01, threshold=6.157e+01, percent-clipped=0.0 2023-12-22 17:14:13,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=693773.3333333334, ans=0.0 2023-12-22 17:14:22,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=693840.0, ans=0.125 2023-12-22 17:14:33,365 INFO [train.py:886] (3/4) Epoch 22, batch 4000, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954915.99 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:14:35,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=693906.6666666666, ans=0.025 2023-12-22 17:14:37,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693906.6666666666, ans=0.1 2023-12-22 17:14:45,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=693973.3333333334, ans=0.0 2023-12-22 17:15:04,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694106.6666666666, ans=0.1 2023-12-22 17:15:09,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=694106.6666666666, ans=0.125 2023-12-22 17:15:12,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=694106.6666666666, ans=0.0 2023-12-22 17:15:16,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=694173.3333333334, ans=0.0 2023-12-22 17:15:25,318 INFO [train.py:886] (3/4) Epoch 22, batch 4050, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4955679.68 frames. ], batch size: 99, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:15:37,178 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.013e+01 3.150e+01 3.347e+01 3.751e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 17:15:40,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-12-22 17:15:44,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=694306.6666666666, ans=0.0 2023-12-22 17:15:44,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=694306.6666666666, ans=0.0 2023-12-22 17:15:47,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=694373.3333333334, ans=0.2 2023-12-22 17:15:57,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694440.0, ans=0.1 2023-12-22 17:16:00,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=694440.0, ans=0.125 2023-12-22 17:16:02,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=694440.0, ans=0.0 2023-12-22 17:16:17,442 INFO [train.py:886] (3/4) Epoch 22, batch 4100, loss[loss=0.01381, audio_tagging_loss=0.01381, over 22820.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4951352.02 frames. ], batch size: 107, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:16:23,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694573.3333333334, ans=0.125 2023-12-22 17:16:25,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694573.3333333334, ans=0.1 2023-12-22 17:16:28,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=694640.0, ans=0.2 2023-12-22 17:16:39,048 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:16:40,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-12-22 17:16:43,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=694706.6666666666, ans=0.0 2023-12-22 17:16:53,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=12.0 2023-12-22 17:17:00,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=694840.0, ans=0.125 2023-12-22 17:17:10,083 INFO [train.py:886] (3/4) Epoch 22, batch 4150, loss[loss=0.01558, audio_tagging_loss=0.01558, over 25000.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4950264.81 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:17:21,199 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 2.978e+01 3.136e+01 3.267e+01 3.850e+01, threshold=6.272e+01, percent-clipped=0.0 2023-12-22 17:17:22,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=694973.3333333334, ans=0.07 2023-12-22 17:17:40,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695106.6666666666, ans=0.1 2023-12-22 17:18:01,990 INFO [train.py:886] (3/4) Epoch 22, batch 4200, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4953748.31 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:18:02,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=695240.0, ans=0.07 2023-12-22 17:18:11,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=695306.6666666666, ans=0.0 2023-12-22 17:18:14,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=695306.6666666666, ans=0.0 2023-12-22 17:18:16,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-12-22 17:18:25,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2023-12-22 17:18:34,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=695440.0, ans=0.0 2023-12-22 17:18:39,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=695440.0, ans=0.04949747468305833 2023-12-22 17:18:42,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=695506.6666666666, ans=0.125 2023-12-22 17:18:53,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=8.0 2023-12-22 17:18:54,180 INFO [train.py:886] (3/4) Epoch 22, batch 4250, loss[loss=0.0152, audio_tagging_loss=0.0152, over 25000.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4951639.20 frames. ], batch size: 100, lr: 4.89e-03, grad_scale: 64.0 2023-12-22 17:19:00,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=695573.3333333334, ans=0.2 2023-12-22 17:19:03,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695640.0, ans=0.1 2023-12-22 17:19:05,255 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 2.977e+01 3.097e+01 3.231e+01 4.216e+01, threshold=6.193e+01, percent-clipped=0.0 2023-12-22 17:19:24,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=695773.3333333334, ans=0.0 2023-12-22 17:19:25,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695773.3333333334, ans=0.1 2023-12-22 17:19:35,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=695840.0, ans=0.125 2023-12-22 17:19:37,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=695840.0, ans=0.0 2023-12-22 17:19:44,884 INFO [train.py:886] (3/4) Epoch 22, batch 4300, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4955257.13 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:19:51,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=695906.6666666666, ans=15.0 2023-12-22 17:20:19,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696106.6666666666, ans=0.1 2023-12-22 17:20:37,670 INFO [train.py:886] (3/4) Epoch 22, batch 4350, loss[loss=0.01298, audio_tagging_loss=0.01298, over 24750.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4960324.48 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:20:37,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696240.0, ans=0.1 2023-12-22 17:20:42,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=696240.0, ans=0.1 2023-12-22 17:20:48,814 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.006e+01 3.144e+01 3.307e+01 3.616e+01, threshold=6.289e+01, percent-clipped=0.0 2023-12-22 17:20:56,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=696306.6666666666, ans=0.0 2023-12-22 17:20:58,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=696373.3333333334, ans=0.0 2023-12-22 17:21:11,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-12-22 17:21:29,649 INFO [train.py:886] (3/4) Epoch 22, batch 4400, loss[loss=0.01507, audio_tagging_loss=0.01507, over 21490.00 frames. ], tot_loss[loss=0.01362, audio_tagging_loss=0.01362, over 4948497.90 frames. ], batch size: 107, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:21:42,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-12-22 17:21:46,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=696640.0, ans=0.0 2023-12-22 17:22:00,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=696773.3333333334, ans=0.125 2023-12-22 17:22:15,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2023-12-22 17:22:22,009 INFO [train.py:886] (3/4) Epoch 22, batch 4450, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4949235.77 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:22:22,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=696906.6666666666, ans=0.0 2023-12-22 17:22:23,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-12-22 17:22:31,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696973.3333333334, ans=0.1 2023-12-22 17:22:33,126 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.622e+01 3.004e+01 3.128e+01 3.261e+01 3.806e+01, threshold=6.255e+01, percent-clipped=0.0 2023-12-22 17:22:42,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=697040.0, ans=0.125 2023-12-22 17:22:49,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=697040.0, ans=0.125 2023-12-22 17:22:55,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=697106.6666666666, ans=0.125 2023-12-22 17:22:57,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2023-12-22 17:22:58,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=697106.6666666666, ans=0.125 2023-12-22 17:23:03,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=697173.3333333334, ans=0.125 2023-12-22 17:23:13,865 INFO [train.py:886] (3/4) Epoch 22, batch 4500, loss[loss=0.01504, audio_tagging_loss=0.01504, over 22095.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4946072.54 frames. ], batch size: 107, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:23:20,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=697240.0, ans=0.0 2023-12-22 17:23:21,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=697240.0, ans=0.125 2023-12-22 17:23:32,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-22 17:24:05,006 INFO [train.py:886] (3/4) Epoch 22, batch 4550, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24024.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4950672.50 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:24:17,629 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 2.927e+01 3.030e+01 3.234e+01 3.642e+01, threshold=6.060e+01, percent-clipped=0.0 2023-12-22 17:24:21,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697640.0, ans=0.1 2023-12-22 17:24:21,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=697640.0, ans=0.1 2023-12-22 17:24:36,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=697773.3333333334, ans=0.0 2023-12-22 17:24:54,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-12-22 17:24:58,051 INFO [train.py:886] (3/4) Epoch 22, batch 4600, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954425.00 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 64.0 2023-12-22 17:24:59,194 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:25:02,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-12-22 17:25:02,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2023-12-22 17:25:26,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=698040.0, ans=0.125 2023-12-22 17:25:41,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=698173.3333333334, ans=0.2 2023-12-22 17:25:49,193 INFO [train.py:886] (3/4) Epoch 22, batch 4650, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4953932.99 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:26:02,026 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.620e+01 2.976e+01 3.135e+01 3.289e+01 3.676e+01, threshold=6.270e+01, percent-clipped=0.0 2023-12-22 17:26:04,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=698306.6666666666, ans=0.125 2023-12-22 17:26:06,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=698306.6666666666, ans=0.0 2023-12-22 17:26:12,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=698373.3333333334, ans=0.0 2023-12-22 17:26:14,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=698373.3333333334, ans=0.0 2023-12-22 17:26:20,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=698440.0, ans=0.0 2023-12-22 17:26:21,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=698440.0, ans=0.125 2023-12-22 17:26:40,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698573.3333333334, ans=0.125 2023-12-22 17:26:41,304 INFO [train.py:886] (3/4) Epoch 22, batch 4700, loss[loss=0.01578, audio_tagging_loss=0.01578, over 24750.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4949578.76 frames. ], batch size: 99, lr: 4.88e-03, grad_scale: 32.0 2023-12-22 17:26:47,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=698573.3333333334, ans=0.0 2023-12-22 17:26:47,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=698573.3333333334, ans=0.09899494936611666 2023-12-22 17:26:47,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=698573.3333333334, ans=0.125 2023-12-22 17:26:50,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698640.0, ans=0.1 2023-12-22 17:27:04,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=698706.6666666666, ans=0.0 2023-12-22 17:27:05,499 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:27:13,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698773.3333333334, ans=0.1 2023-12-22 17:27:16,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=698773.3333333334, ans=0.09899494936611666 2023-12-22 17:27:28,045 INFO [train.py:886] (3/4) Epoch 22, batch 4750, loss[loss=0.01436, audio_tagging_loss=0.01436, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4943456.67 frames. ], batch size: 99, lr: 4.87e-03, grad_scale: 32.0 2023-12-22 17:27:33,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=698906.6666666666, ans=0.05 2023-12-22 17:27:39,150 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.016e+01 3.140e+01 3.268e+01 3.852e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 17:27:41,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=698973.3333333334, ans=0.125 2023-12-22 17:28:02,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 17:28:02,425 INFO [train.py:886] (3/4) Epoch 23, batch 0, loss[loss=0.03027, audio_tagging_loss=0.03027, over 25000.00 frames. ], tot_loss[loss=0.03027, audio_tagging_loss=0.03027, over 25000.00 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:28:02,426 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 17:28:23,544 INFO [train.py:917] (3/4) Epoch 23, validation: loss=0.03207, audio_tagging_loss=0.03207, over 3737520.00 frames. 2023-12-22 17:28:23,544 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 17:28:25,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699013.3333333334, ans=0.1 2023-12-22 17:28:26,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=699013.3333333334, ans=0.125 2023-12-22 17:28:39,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=699080.0, ans=0.125 2023-12-22 17:28:41,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-22 17:28:56,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-22 17:28:57,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699213.3333333334, ans=0.125 2023-12-22 17:29:14,251 INFO [train.py:886] (3/4) Epoch 23, batch 50, loss[loss=0.01624, audio_tagging_loss=0.01624, over 25000.00 frames. ], tot_loss[loss=0.02076, audio_tagging_loss=0.02076, over 1120005.10 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:29:22,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=699346.6666666666, ans=0.125 2023-12-22 17:29:24,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699413.3333333334, ans=0.125 2023-12-22 17:29:30,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=699413.3333333334, ans=0.125 2023-12-22 17:29:33,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=699413.3333333334, ans=0.0 2023-12-22 17:29:33,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.23 vs. limit=12.0 2023-12-22 17:29:55,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=699613.3333333334, ans=0.125 2023-12-22 17:30:02,794 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.567e+01 3.829e+01 4.360e+01 9.695e+01, threshold=7.658e+01, percent-clipped=7.0 2023-12-22 17:30:07,309 INFO [train.py:886] (3/4) Epoch 23, batch 100, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01829, audio_tagging_loss=0.01829, over 1972714.72 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:30:08,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=699680.0, ans=0.05 2023-12-22 17:30:16,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699746.6666666666, ans=0.1 2023-12-22 17:30:39,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=12.0 2023-12-22 17:30:49,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=699946.6666666666, ans=0.125 2023-12-22 17:30:53,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=699946.6666666666, ans=0.0 2023-12-22 17:30:57,477 INFO [train.py:886] (3/4) Epoch 23, batch 150, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01676, audio_tagging_loss=0.01676, over 2639359.78 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:09,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700080.0, ans=0.1 2023-12-22 17:31:33,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=700213.3333333334, ans=0.015 2023-12-22 17:31:45,965 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.729e+01 3.066e+01 3.193e+01 3.321e+01 3.839e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 17:31:48,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=700280.0, ans=0.125 2023-12-22 17:31:50,616 INFO [train.py:886] (3/4) Epoch 23, batch 200, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 3155237.65 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:31:52,724 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:31:53,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=700346.6666666666, ans=0.125 2023-12-22 17:31:55,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=700346.6666666666, ans=0.0 2023-12-22 17:32:04,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-12-22 17:32:08,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=700413.3333333334, ans=0.0 2023-12-22 17:32:15,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=700480.0, ans=0.0 2023-12-22 17:32:20,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-12-22 17:32:21,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=700546.6666666666, ans=0.125 2023-12-22 17:32:24,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=700546.6666666666, ans=0.0 2023-12-22 17:32:33,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=700613.3333333334, ans=0.125 2023-12-22 17:32:41,857 INFO [train.py:886] (3/4) Epoch 23, batch 250, loss[loss=0.01442, audio_tagging_loss=0.01442, over 24750.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 3558883.76 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:32:46,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=700680.0, ans=0.125 2023-12-22 17:32:47,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-22 17:32:49,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=700680.0, ans=0.125 2023-12-22 17:32:57,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=700746.6666666666, ans=0.2 2023-12-22 17:33:11,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=700813.3333333334, ans=0.125 2023-12-22 17:33:23,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=700946.6666666666, ans=0.2 2023-12-22 17:33:25,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700946.6666666666, ans=0.1 2023-12-22 17:33:30,256 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.034e+01 3.166e+01 3.369e+01 3.955e+01, threshold=6.333e+01, percent-clipped=0.0 2023-12-22 17:33:32,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-22 17:33:34,009 INFO [train.py:886] (3/4) Epoch 23, batch 300, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01473, audio_tagging_loss=0.01473, over 3860564.11 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:33:54,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=701146.6666666666, ans=0.0 2023-12-22 17:34:16,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=701280.0, ans=0.125 2023-12-22 17:34:23,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=701280.0, ans=0.0 2023-12-22 17:34:25,911 INFO [train.py:886] (3/4) Epoch 23, batch 350, loss[loss=0.01363, audio_tagging_loss=0.01363, over 24750.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 4103102.29 frames. ], batch size: 99, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:34:27,055 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:34:27,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=701346.6666666666, ans=0.125 2023-12-22 17:34:36,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701413.3333333334, ans=0.1 2023-12-22 17:34:41,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=701413.3333333334, ans=0.125 2023-12-22 17:34:47,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=701480.0, ans=0.035 2023-12-22 17:34:49,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=701480.0, ans=0.125 2023-12-22 17:34:57,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=701546.6666666666, ans=0.125 2023-12-22 17:35:00,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=701546.6666666666, ans=0.0 2023-12-22 17:35:13,064 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.991e+01 3.090e+01 3.268e+01 3.987e+01, threshold=6.180e+01, percent-clipped=0.0 2023-12-22 17:35:16,899 INFO [train.py:886] (3/4) Epoch 23, batch 400, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01416, audio_tagging_loss=0.01416, over 4288432.27 frames. ], batch size: 100, lr: 4.76e-03, grad_scale: 32.0 2023-12-22 17:35:33,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=701746.6666666666, ans=0.0 2023-12-22 17:35:35,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=701746.6666666666, ans=0.0 2023-12-22 17:35:49,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-22 17:35:57,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=701880.0, ans=0.0 2023-12-22 17:36:09,364 INFO [train.py:886] (3/4) Epoch 23, batch 450, loss[loss=0.01179, audio_tagging_loss=0.01179, over 23897.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 4437573.85 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:36:15,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2023-12-22 17:36:24,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=702080.0, ans=0.125 2023-12-22 17:36:26,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702080.0, ans=0.1 2023-12-22 17:36:52,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702280.0, ans=0.1 2023-12-22 17:36:57,228 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.676e+01 2.894e+01 3.036e+01 3.209e+01 3.784e+01, threshold=6.073e+01, percent-clipped=0.0 2023-12-22 17:37:02,429 INFO [train.py:886] (3/4) Epoch 23, batch 500, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4557159.79 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:37:15,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=702413.3333333334, ans=0.09899494936611666 2023-12-22 17:37:33,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702546.6666666666, ans=0.1 2023-12-22 17:37:41,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=702546.6666666666, ans=0.0 2023-12-22 17:37:46,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702613.3333333334, ans=0.1 2023-12-22 17:37:53,968 INFO [train.py:886] (3/4) Epoch 23, batch 550, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4645586.73 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:38:12,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=702746.6666666666, ans=0.125 2023-12-22 17:38:16,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=702813.3333333334, ans=0.125 2023-12-22 17:38:16,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702813.3333333334, ans=0.1 2023-12-22 17:38:19,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=702813.3333333334, ans=0.0 2023-12-22 17:38:20,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=702813.3333333334, ans=0.07 2023-12-22 17:38:28,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702880.0, ans=0.125 2023-12-22 17:38:39,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=702946.6666666666, ans=0.125 2023-12-22 17:38:42,268 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 2.974e+01 3.097e+01 3.242e+01 4.856e+01, threshold=6.195e+01, percent-clipped=0.0 2023-12-22 17:38:45,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=703013.3333333334, ans=0.0 2023-12-22 17:38:46,292 INFO [train.py:886] (3/4) Epoch 23, batch 600, loss[loss=0.01338, audio_tagging_loss=0.01338, over 24750.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4708363.99 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:38:47,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-12-22 17:39:26,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=703280.0, ans=0.09899494936611666 2023-12-22 17:39:38,010 INFO [train.py:886] (3/4) Epoch 23, batch 650, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24750.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4758592.47 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:39:54,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=703413.3333333334, ans=0.0 2023-12-22 17:40:23,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703613.3333333334, ans=0.1 2023-12-22 17:40:25,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=703613.3333333334, ans=0.0 2023-12-22 17:40:25,970 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.701e+01 3.073e+01 3.203e+01 3.360e+01 3.704e+01, threshold=6.407e+01, percent-clipped=0.0 2023-12-22 17:40:29,828 INFO [train.py:886] (3/4) Epoch 23, batch 700, loss[loss=0.01502, audio_tagging_loss=0.01502, over 21844.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4797559.25 frames. ], batch size: 107, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:40:31,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=703680.0, ans=15.0 2023-12-22 17:40:51,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703813.3333333334, ans=0.1 2023-12-22 17:40:55,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=703813.3333333334, ans=0.125 2023-12-22 17:41:01,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=703880.0, ans=0.125 2023-12-22 17:41:23,292 INFO [train.py:886] (3/4) Epoch 23, batch 750, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4836321.57 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:41:38,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2023-12-22 17:42:01,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2023-12-22 17:42:09,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=704280.0, ans=0.125 2023-12-22 17:42:10,093 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.982e+01 3.107e+01 3.204e+01 3.694e+01, threshold=6.214e+01, percent-clipped=0.0 2023-12-22 17:42:13,948 INFO [train.py:886] (3/4) Epoch 23, batch 800, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4862501.15 frames. ], batch size: 99, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:42:23,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=704346.6666666666, ans=0.0 2023-12-22 17:42:59,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704613.3333333334, ans=0.1 2023-12-22 17:43:05,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=704613.3333333334, ans=0.0 2023-12-22 17:43:06,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704680.0, ans=0.1 2023-12-22 17:43:06,733 INFO [train.py:886] (3/4) Epoch 23, batch 850, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4876045.51 frames. ], batch size: 100, lr: 4.75e-03, grad_scale: 32.0 2023-12-22 17:43:10,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=704680.0, ans=0.07 2023-12-22 17:43:14,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=704680.0, ans=0.0 2023-12-22 17:43:35,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-12-22 17:43:41,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=704880.0, ans=0.125 2023-12-22 17:43:52,806 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 2.999e+01 3.165e+01 3.314e+01 4.054e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 17:43:58,120 INFO [train.py:886] (3/4) Epoch 23, batch 900, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4891939.04 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:44:05,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=705013.3333333334, ans=0.125 2023-12-22 17:44:07,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=705013.3333333334, ans=0.125 2023-12-22 17:44:17,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705146.6666666666, ans=0.1 2023-12-22 17:44:22,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-12-22 17:44:36,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=705213.3333333334, ans=0.0 2023-12-22 17:44:36,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-12-22 17:44:37,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=705213.3333333334, ans=0.125 2023-12-22 17:44:38,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=705213.3333333334, ans=0.2 2023-12-22 17:44:44,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2023-12-22 17:44:49,882 INFO [train.py:886] (3/4) Epoch 23, batch 950, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24034.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4897637.08 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:45:33,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705613.3333333334, ans=0.125 2023-12-22 17:45:38,080 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.002e+01 3.153e+01 3.252e+01 3.769e+01, threshold=6.307e+01, percent-clipped=0.0 2023-12-22 17:45:41,956 INFO [train.py:886] (3/4) Epoch 23, batch 1000, loss[loss=0.01352, audio_tagging_loss=0.01352, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4903889.10 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:45:47,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=705680.0, ans=0.125 2023-12-22 17:45:50,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=705680.0, ans=0.125 2023-12-22 17:46:00,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-12-22 17:46:01,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=705813.3333333334, ans=0.125 2023-12-22 17:46:14,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=705880.0, ans=0.2 2023-12-22 17:46:17,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=705880.0, ans=0.2 2023-12-22 17:46:20,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=705880.0, ans=0.125 2023-12-22 17:46:22,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=705946.6666666666, ans=0.05 2023-12-22 17:46:30,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705946.6666666666, ans=0.125 2023-12-22 17:46:31,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2023-12-22 17:46:32,416 INFO [train.py:886] (3/4) Epoch 23, batch 1050, loss[loss=0.0142, audio_tagging_loss=0.0142, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4916808.45 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:46:43,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2023-12-22 17:46:58,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=706146.6666666666, ans=0.125 2023-12-22 17:47:22,002 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.686e+01 2.936e+01 3.111e+01 3.242e+01 3.902e+01, threshold=6.222e+01, percent-clipped=0.0 2023-12-22 17:47:25,855 INFO [train.py:886] (3/4) Epoch 23, batch 1100, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4927769.24 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:47:30,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=706346.6666666666, ans=0.125 2023-12-22 17:47:36,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706413.3333333334, ans=0.1 2023-12-22 17:47:57,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=706546.6666666666, ans=0.125 2023-12-22 17:47:59,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-12-22 17:48:05,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706613.3333333334, ans=0.125 2023-12-22 17:48:07,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=706613.3333333334, ans=0.125 2023-12-22 17:48:18,184 INFO [train.py:886] (3/4) Epoch 23, batch 1150, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4938983.79 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:48:38,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706813.3333333334, ans=0.1 2023-12-22 17:48:39,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=706813.3333333334, ans=0.125 2023-12-22 17:48:41,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=706813.3333333334, ans=0.125 2023-12-22 17:48:41,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706813.3333333334, ans=0.1 2023-12-22 17:48:45,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=706813.3333333334, ans=0.125 2023-12-22 17:48:49,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-12-22 17:49:05,124 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 2.995e+01 3.117e+01 3.266e+01 4.017e+01, threshold=6.234e+01, percent-clipped=0.0 2023-12-22 17:49:06,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706946.6666666666, ans=0.1 2023-12-22 17:49:08,950 INFO [train.py:886] (3/4) Epoch 23, batch 1200, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4944662.57 frames. ], batch size: 100, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:49:18,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=707080.0, ans=0.0 2023-12-22 17:49:23,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-22 17:49:25,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-12-22 17:49:57,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707280.0, ans=0.1 2023-12-22 17:50:01,239 INFO [train.py:886] (3/4) Epoch 23, batch 1250, loss[loss=0.01626, audio_tagging_loss=0.01626, over 24750.00 frames. ], tot_loss[loss=0.0136, audio_tagging_loss=0.0136, over 4937969.17 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:50:05,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=707346.6666666666, ans=0.04949747468305833 2023-12-22 17:50:22,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=707480.0, ans=0.05 2023-12-22 17:50:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=707546.6666666666, ans=0.125 2023-12-22 17:50:38,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-12-22 17:50:39,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=707546.6666666666, ans=0.0 2023-12-22 17:50:46,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+01 3.099e+01 3.181e+01 3.380e+01 4.641e+01, threshold=6.362e+01, percent-clipped=0.0 2023-12-22 17:50:51,541 INFO [train.py:886] (3/4) Epoch 23, batch 1300, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01377, audio_tagging_loss=0.01377, over 4940323.58 frames. ], batch size: 99, lr: 4.74e-03, grad_scale: 32.0 2023-12-22 17:51:11,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2023-12-22 17:51:36,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=707946.6666666666, ans=0.125 2023-12-22 17:51:36,242 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:51:40,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707946.6666666666, ans=0.125 2023-12-22 17:51:43,549 INFO [train.py:886] (3/4) Epoch 23, batch 1350, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 4939126.04 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:51:43,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=708013.3333333334, ans=0.125 2023-12-22 17:51:45,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=708013.3333333334, ans=0.2 2023-12-22 17:51:59,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-22 17:52:12,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-22 17:52:31,179 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+01 2.956e+01 3.060e+01 3.186e+01 3.861e+01, threshold=6.119e+01, percent-clipped=0.0 2023-12-22 17:52:34,909 INFO [train.py:886] (3/4) Epoch 23, batch 1400, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01357, audio_tagging_loss=0.01357, over 4934552.32 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:52:36,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=708346.6666666666, ans=0.125 2023-12-22 17:52:41,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=708346.6666666666, ans=0.125 2023-12-22 17:52:47,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=708413.3333333334, ans=0.125 2023-12-22 17:52:54,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=708480.0, ans=0.125 2023-12-22 17:53:18,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=708613.3333333334, ans=0.1 2023-12-22 17:53:25,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=708680.0, ans=0.125 2023-12-22 17:53:25,965 INFO [train.py:886] (3/4) Epoch 23, batch 1450, loss[loss=0.01646, audio_tagging_loss=0.01646, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4944716.06 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:53:30,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=708680.0, ans=0.0 2023-12-22 17:53:35,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.07 vs. limit=22.5 2023-12-22 17:53:52,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=708813.3333333334, ans=0.125 2023-12-22 17:54:02,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=708880.0, ans=0.125 2023-12-22 17:54:13,002 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.011e+01 3.145e+01 3.302e+01 3.909e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 17:54:16,893 INFO [train.py:886] (3/4) Epoch 23, batch 1500, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4945699.28 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:54:29,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709080.0, ans=0.1 2023-12-22 17:54:43,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-12-22 17:54:44,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=709146.6666666666, ans=0.0 2023-12-22 17:54:47,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=709213.3333333334, ans=0.125 2023-12-22 17:54:54,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=709213.3333333334, ans=0.0 2023-12-22 17:54:54,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 17:54:55,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=709213.3333333334, ans=0.125 2023-12-22 17:54:55,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=709213.3333333334, ans=0.04949747468305833 2023-12-22 17:55:08,678 INFO [train.py:886] (3/4) Epoch 23, batch 1550, loss[loss=0.01875, audio_tagging_loss=0.01875, over 24948.00 frames. ], tot_loss[loss=0.01352, audio_tagging_loss=0.01352, over 4943910.21 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:55:18,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=709413.3333333334, ans=0.125 2023-12-22 17:55:18,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-12-22 17:55:47,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709546.6666666666, ans=0.125 2023-12-22 17:55:51,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-12-22 17:55:55,132 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.654e+01 3.064e+01 3.162e+01 3.305e+01 3.702e+01, threshold=6.324e+01, percent-clipped=0.0 2023-12-22 17:55:59,694 INFO [train.py:886] (3/4) Epoch 23, batch 1600, loss[loss=0.01476, audio_tagging_loss=0.01476, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4940532.47 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:56:03,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709680.0, ans=0.1 2023-12-22 17:56:03,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=709680.0, ans=0.0 2023-12-22 17:56:09,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709746.6666666666, ans=0.125 2023-12-22 17:56:13,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-12-22 17:56:18,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=709746.6666666666, ans=0.125 2023-12-22 17:56:18,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=709746.6666666666, ans=0.125 2023-12-22 17:56:29,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=709880.0, ans=0.1 2023-12-22 17:56:36,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709880.0, ans=0.125 2023-12-22 17:56:51,513 INFO [train.py:886] (3/4) Epoch 23, batch 1650, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4937848.68 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:56:57,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=710013.3333333334, ans=0.125 2023-12-22 17:57:38,704 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.643e+01 2.998e+01 3.094e+01 3.262e+01 4.064e+01, threshold=6.189e+01, percent-clipped=0.0 2023-12-22 17:57:43,148 INFO [train.py:886] (3/4) Epoch 23, batch 1700, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4943311.34 frames. ], batch size: 99, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:57:47,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=710346.6666666666, ans=0.0 2023-12-22 17:58:00,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=710413.3333333334, ans=0.125 2023-12-22 17:58:00,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=710413.3333333334, ans=0.0 2023-12-22 17:58:06,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=710480.0, ans=0.025 2023-12-22 17:58:15,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=710546.6666666666, ans=0.125 2023-12-22 17:58:22,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=710546.6666666666, ans=0.125 2023-12-22 17:58:35,031 INFO [train.py:886] (3/4) Epoch 23, batch 1750, loss[loss=0.01377, audio_tagging_loss=0.01377, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4950174.54 frames. ], batch size: 100, lr: 4.73e-03, grad_scale: 32.0 2023-12-22 17:58:39,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=710680.0, ans=0.0 2023-12-22 17:58:51,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=710746.6666666666, ans=0.125 2023-12-22 17:58:53,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=710746.6666666666, ans=0.0 2023-12-22 17:58:59,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-12-22 17:59:06,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=710880.0, ans=0.04949747468305833 2023-12-22 17:59:13,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=710880.0, ans=0.2 2023-12-22 17:59:23,060 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+01 3.006e+01 3.113e+01 3.269e+01 3.526e+01, threshold=6.226e+01, percent-clipped=0.0 2023-12-22 17:59:24,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=710946.6666666666, ans=0.125 2023-12-22 17:59:28,206 INFO [train.py:886] (3/4) Epoch 23, batch 1800, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4955009.95 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 17:59:33,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=711013.3333333334, ans=0.125 2023-12-22 17:59:33,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=15.0 2023-12-22 17:59:34,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-12-22 18:00:00,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=711213.3333333334, ans=0.0 2023-12-22 18:00:07,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=711213.3333333334, ans=0.025 2023-12-22 18:00:18,267 INFO [train.py:886] (3/4) Epoch 23, batch 1850, loss[loss=0.01306, audio_tagging_loss=0.01306, over 21999.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4952848.34 frames. ], batch size: 107, lr: 4.72e-03, grad_scale: 32.0 2023-12-22 18:00:34,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=711413.3333333334, ans=0.09899494936611666 2023-12-22 18:00:42,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711480.0, ans=0.125 2023-12-22 18:00:44,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711480.0, ans=0.1 2023-12-22 18:01:05,839 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+01 3.028e+01 3.200e+01 3.336e+01 4.130e+01, threshold=6.400e+01, percent-clipped=0.0 2023-12-22 18:01:09,750 INFO [train.py:886] (3/4) Epoch 23, batch 1900, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4944871.07 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:01:24,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=711746.6666666666, ans=0.125 2023-12-22 18:01:48,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=711880.0, ans=0.0 2023-12-22 18:01:55,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-12-22 18:02:00,779 INFO [train.py:886] (3/4) Epoch 23, batch 1950, loss[loss=0.01221, audio_tagging_loss=0.01221, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4941983.06 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:02,923 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:02:06,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=712013.3333333334, ans=0.125 2023-12-22 18:02:14,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-22 18:02:18,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-22 18:02:22,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712146.6666666666, ans=0.0 2023-12-22 18:02:27,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.25 vs. limit=22.5 2023-12-22 18:02:43,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712280.0, ans=0.1 2023-12-22 18:02:45,744 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.588e+01 2.985e+01 3.120e+01 3.302e+01 3.747e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:02:49,580 INFO [train.py:886] (3/4) Epoch 23, batch 2000, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4944398.52 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:02:49,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=712346.6666666666, ans=0.0 2023-12-22 18:02:57,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=712346.6666666666, ans=0.05 2023-12-22 18:02:59,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712413.3333333334, ans=0.125 2023-12-22 18:03:12,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2023-12-22 18:03:31,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-12-22 18:03:41,265 INFO [train.py:886] (3/4) Epoch 23, batch 2050, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4950694.53 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:03:50,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=712746.6666666666, ans=0.125 2023-12-22 18:03:54,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-12-22 18:04:02,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=712813.3333333334, ans=0.2 2023-12-22 18:04:12,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=712880.0, ans=22.5 2023-12-22 18:04:22,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=712946.6666666666, ans=0.125 2023-12-22 18:04:27,760 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.661e+01 2.995e+01 3.150e+01 3.287e+01 3.794e+01, threshold=6.300e+01, percent-clipped=0.0 2023-12-22 18:04:31,583 INFO [train.py:886] (3/4) Epoch 23, batch 2100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4949326.80 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:04:31,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=713013.3333333334, ans=0.125 2023-12-22 18:05:03,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=713213.3333333334, ans=0.0 2023-12-22 18:05:13,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=713280.0, ans=0.2 2023-12-22 18:05:24,614 INFO [train.py:886] (3/4) Epoch 23, batch 2150, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24069.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4954494.53 frames. ], batch size: 100, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:05:36,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=713413.3333333334, ans=0.0 2023-12-22 18:05:41,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=713413.3333333334, ans=0.125 2023-12-22 18:05:49,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-12-22 18:05:52,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=713480.0, ans=15.0 2023-12-22 18:05:53,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-12-22 18:05:55,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=713546.6666666666, ans=0.0 2023-12-22 18:06:02,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713546.6666666666, ans=0.1 2023-12-22 18:06:11,595 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.649e+01 3.004e+01 3.147e+01 3.263e+01 3.799e+01, threshold=6.294e+01, percent-clipped=0.0 2023-12-22 18:06:16,120 INFO [train.py:886] (3/4) Epoch 23, batch 2200, loss[loss=0.01452, audio_tagging_loss=0.01452, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4954546.66 frames. ], batch size: 99, lr: 4.72e-03, grad_scale: 64.0 2023-12-22 18:06:33,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=713746.6666666666, ans=0.0 2023-12-22 18:06:47,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-22 18:07:06,813 INFO [train.py:886] (3/4) Epoch 23, batch 2250, loss[loss=0.01265, audio_tagging_loss=0.01265, over 24750.00 frames. ], tot_loss[loss=0.01342, audio_tagging_loss=0.01342, over 4950283.50 frames. ], batch size: 99, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:07:09,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=714013.3333333334, ans=0.2 2023-12-22 18:07:10,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.18 vs. limit=22.5 2023-12-22 18:07:17,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=714080.0, ans=0.2 2023-12-22 18:07:36,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=714146.6666666666, ans=0.0 2023-12-22 18:07:38,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=714213.3333333334, ans=0.2 2023-12-22 18:07:38,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2023-12-22 18:07:55,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 2.950e+01 3.106e+01 3.281e+01 3.764e+01, threshold=6.212e+01, percent-clipped=0.0 2023-12-22 18:07:58,974 INFO [train.py:886] (3/4) Epoch 23, batch 2300, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4950725.31 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:08:12,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=714413.3333333334, ans=0.0 2023-12-22 18:08:14,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=714413.3333333334, ans=0.125 2023-12-22 18:08:15,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=714413.3333333334, ans=0.0 2023-12-22 18:08:39,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=714613.3333333334, ans=0.0 2023-12-22 18:08:47,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=714613.3333333334, ans=0.0 2023-12-22 18:08:51,195 INFO [train.py:886] (3/4) Epoch 23, batch 2350, loss[loss=0.01469, audio_tagging_loss=0.01469, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4948811.89 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:08:53,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=714680.0, ans=0.0 2023-12-22 18:08:58,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=714680.0, ans=0.125 2023-12-22 18:09:16,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.34 vs. limit=10.0 2023-12-22 18:09:18,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=714813.3333333334, ans=0.125 2023-12-22 18:09:36,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=12.0 2023-12-22 18:09:38,597 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 2.969e+01 3.079e+01 3.242e+01 3.705e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:09:43,075 INFO [train.py:886] (3/4) Epoch 23, batch 2400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4957563.48 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:09:53,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=715080.0, ans=0.0 2023-12-22 18:10:05,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=715146.6666666666, ans=0.125 2023-12-22 18:10:17,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=715213.3333333334, ans=0.05 2023-12-22 18:10:21,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=22.5 2023-12-22 18:10:21,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=715213.3333333334, ans=0.125 2023-12-22 18:10:33,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715280.0, ans=0.125 2023-12-22 18:10:35,279 INFO [train.py:886] (3/4) Epoch 23, batch 2450, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4957812.88 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:10:47,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=715413.3333333334, ans=0.125 2023-12-22 18:10:50,546 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:11:11,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=715546.6666666666, ans=0.125 2023-12-22 18:11:12,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=715546.6666666666, ans=0.0 2023-12-22 18:11:22,434 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+01 2.984e+01 3.127e+01 3.304e+01 3.935e+01, threshold=6.253e+01, percent-clipped=0.0 2023-12-22 18:11:24,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=715613.3333333334, ans=0.2 2023-12-22 18:11:26,261 INFO [train.py:886] (3/4) Epoch 23, batch 2500, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4956129.50 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:11:37,349 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.580e-02 2023-12-22 18:11:44,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=715746.6666666666, ans=0.0 2023-12-22 18:11:44,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=715746.6666666666, ans=0.125 2023-12-22 18:11:46,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=715813.3333333334, ans=0.125 2023-12-22 18:11:47,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-22 18:11:48,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=715813.3333333334, ans=0.125 2023-12-22 18:11:59,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=715880.0, ans=0.125 2023-12-22 18:12:12,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=715946.6666666666, ans=0.2 2023-12-22 18:12:17,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-22 18:12:18,519 INFO [train.py:886] (3/4) Epoch 23, batch 2550, loss[loss=0.01274, audio_tagging_loss=0.01274, over 23990.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 4951979.56 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:12:21,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=716013.3333333334, ans=0.0 2023-12-22 18:12:50,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=716213.3333333334, ans=0.0 2023-12-22 18:13:05,418 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 2.995e+01 3.127e+01 3.270e+01 4.281e+01, threshold=6.254e+01, percent-clipped=0.0 2023-12-22 18:13:05,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=716280.0, ans=0.125 2023-12-22 18:13:10,609 INFO [train.py:886] (3/4) Epoch 23, batch 2600, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4947968.31 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:13:10,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=716346.6666666666, ans=0.125 2023-12-22 18:13:23,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=716413.3333333334, ans=0.125 2023-12-22 18:13:24,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=716413.3333333334, ans=0.2 2023-12-22 18:13:37,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=716480.0, ans=0.125 2023-12-22 18:13:44,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=716546.6666666666, ans=0.0 2023-12-22 18:13:55,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=716613.3333333334, ans=0.125 2023-12-22 18:14:00,211 INFO [train.py:886] (3/4) Epoch 23, batch 2650, loss[loss=0.01566, audio_tagging_loss=0.01566, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4955554.10 frames. ], batch size: 100, lr: 4.71e-03, grad_scale: 64.0 2023-12-22 18:14:32,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=716880.0, ans=0.125 2023-12-22 18:14:35,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-12-22 18:14:40,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=716946.6666666666, ans=0.0 2023-12-22 18:14:48,295 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.007e+01 3.105e+01 3.259e+01 4.059e+01, threshold=6.210e+01, percent-clipped=0.0 2023-12-22 18:14:52,118 INFO [train.py:886] (3/4) Epoch 23, batch 2700, loss[loss=0.01382, audio_tagging_loss=0.01382, over 25000.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4958663.58 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:15:19,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=717146.6666666666, ans=0.125 2023-12-22 18:15:26,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=717213.3333333334, ans=0.015 2023-12-22 18:15:29,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=717213.3333333334, ans=0.125 2023-12-22 18:15:32,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=717280.0, ans=0.2 2023-12-22 18:15:37,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=717280.0, ans=0.0 2023-12-22 18:15:42,643 INFO [train.py:886] (3/4) Epoch 23, batch 2750, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4958979.13 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:16:08,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=717480.0, ans=0.2 2023-12-22 18:16:09,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=717480.0, ans=0.125 2023-12-22 18:16:30,040 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+01 3.005e+01 3.171e+01 3.321e+01 3.773e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 18:16:33,801 INFO [train.py:886] (3/4) Epoch 23, batch 2800, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4959045.07 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:16:34,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=717680.0, ans=0.05 2023-12-22 18:16:40,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=717680.0, ans=0.0 2023-12-22 18:17:19,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-12-22 18:17:20,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=717946.6666666666, ans=0.125 2023-12-22 18:17:21,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=717946.6666666666, ans=0.125 2023-12-22 18:17:25,551 INFO [train.py:886] (3/4) Epoch 23, batch 2850, loss[loss=0.01432, audio_tagging_loss=0.01432, over 24750.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4951622.92 frames. ], batch size: 99, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:17:27,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=718013.3333333334, ans=0.04949747468305833 2023-12-22 18:17:32,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718013.3333333334, ans=0.1 2023-12-22 18:17:57,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=718213.3333333334, ans=0.125 2023-12-22 18:18:00,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-22 18:18:10,878 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.010e+01 3.133e+01 3.291e+01 3.864e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 18:18:14,650 INFO [train.py:886] (3/4) Epoch 23, batch 2900, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4949846.12 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:18:20,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=718346.6666666666, ans=0.025 2023-12-22 18:18:27,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=718413.3333333334, ans=0.025 2023-12-22 18:18:32,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=718413.3333333334, ans=0.0 2023-12-22 18:18:34,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=718480.0, ans=0.0 2023-12-22 18:18:36,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=718480.0, ans=0.125 2023-12-22 18:18:41,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=718480.0, ans=0.0 2023-12-22 18:18:46,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-22 18:18:56,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-12-22 18:19:01,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-12-22 18:19:01,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=718613.3333333334, ans=0.125 2023-12-22 18:19:03,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=718613.3333333334, ans=0.2 2023-12-22 18:19:06,757 INFO [train.py:886] (3/4) Epoch 23, batch 2950, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4949499.81 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:24,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=718746.6666666666, ans=0.125 2023-12-22 18:19:32,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=718813.3333333334, ans=15.0 2023-12-22 18:19:40,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-22 18:19:45,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=718880.0, ans=0.0 2023-12-22 18:19:49,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=718946.6666666666, ans=0.125 2023-12-22 18:19:52,810 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 2.910e+01 3.043e+01 3.179e+01 4.091e+01, threshold=6.086e+01, percent-clipped=0.0 2023-12-22 18:19:54,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=718946.6666666666, ans=0.0 2023-12-22 18:19:58,623 INFO [train.py:886] (3/4) Epoch 23, batch 3000, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4951124.92 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:19:58,623 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 18:20:19,160 INFO [train.py:917] (3/4) Epoch 23, validation: loss=0.03349, audio_tagging_loss=0.03349, over 3737520.00 frames. 2023-12-22 18:20:19,161 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 18:20:21,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=719013.3333333334, ans=0.0 2023-12-22 18:20:38,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=719080.0, ans=0.1 2023-12-22 18:20:39,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=719080.0, ans=0.125 2023-12-22 18:20:43,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=719146.6666666666, ans=0.2 2023-12-22 18:20:58,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-22 18:20:58,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=719213.3333333334, ans=0.035 2023-12-22 18:21:01,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=719280.0, ans=0.0 2023-12-22 18:21:11,889 INFO [train.py:886] (3/4) Epoch 23, batch 3050, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4952851.44 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:21:38,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2023-12-22 18:21:58,932 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.000e+01 3.103e+01 3.222e+01 3.770e+01, threshold=6.206e+01, percent-clipped=0.0 2023-12-22 18:22:04,179 INFO [train.py:886] (3/4) Epoch 23, batch 3100, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4951737.84 frames. ], batch size: 100, lr: 4.70e-03, grad_scale: 64.0 2023-12-22 18:22:12,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2023-12-22 18:22:14,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-12-22 18:22:26,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2023-12-22 18:22:31,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=719813.3333333334, ans=0.125 2023-12-22 18:22:36,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=719880.0, ans=0.125 2023-12-22 18:22:43,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=719880.0, ans=0.125 2023-12-22 18:22:56,622 INFO [train.py:886] (3/4) Epoch 23, batch 3150, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4949990.44 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:22:57,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=720013.3333333334, ans=0.125 2023-12-22 18:22:59,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=720013.3333333334, ans=0.0 2023-12-22 18:23:09,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720080.0, ans=0.1 2023-12-22 18:23:18,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=720146.6666666666, ans=0.125 2023-12-22 18:23:23,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=12.0 2023-12-22 18:23:44,193 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+01 3.010e+01 3.145e+01 3.312e+01 3.855e+01, threshold=6.290e+01, percent-clipped=0.0 2023-12-22 18:23:44,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=720280.0, ans=0.125 2023-12-22 18:23:48,793 INFO [train.py:886] (3/4) Epoch 23, batch 3200, loss[loss=0.01404, audio_tagging_loss=0.01404, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4939538.32 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:04,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720413.3333333334, ans=0.125 2023-12-22 18:24:08,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=720480.0, ans=0.125 2023-12-22 18:24:09,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=720480.0, ans=0.0 2023-12-22 18:24:11,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=720480.0, ans=0.125 2023-12-22 18:24:12,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2023-12-22 18:24:29,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=720613.3333333334, ans=0.0 2023-12-22 18:24:39,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=720680.0, ans=0.2 2023-12-22 18:24:40,436 INFO [train.py:886] (3/4) Epoch 23, batch 3250, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4939604.21 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:24:46,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=720680.0, ans=0.125 2023-12-22 18:24:57,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=720746.6666666666, ans=0.125 2023-12-22 18:25:02,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=720813.3333333334, ans=0.0 2023-12-22 18:25:20,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=12.0 2023-12-22 18:25:28,112 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.689e+01 2.895e+01 3.079e+01 3.188e+01 4.978e+01, threshold=6.159e+01, percent-clipped=0.0 2023-12-22 18:25:32,137 INFO [train.py:886] (3/4) Epoch 23, batch 3300, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4943675.69 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:25:33,363 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:25:42,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=721080.0, ans=0.07 2023-12-22 18:25:49,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=721080.0, ans=0.025 2023-12-22 18:25:49,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=721080.0, ans=0.0 2023-12-22 18:26:03,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=721213.3333333334, ans=0.2 2023-12-22 18:26:25,030 INFO [train.py:886] (3/4) Epoch 23, batch 3350, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4947443.98 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:26:46,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=721480.0, ans=0.05 2023-12-22 18:26:46,060 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:26:55,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-12-22 18:27:00,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=721546.6666666666, ans=0.125 2023-12-22 18:27:02,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=721546.6666666666, ans=0.0 2023-12-22 18:27:12,422 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.691e+01 3.001e+01 3.141e+01 3.291e+01 3.725e+01, threshold=6.283e+01, percent-clipped=0.0 2023-12-22 18:27:16,206 INFO [train.py:886] (3/4) Epoch 23, batch 3400, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4949747.93 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:27:16,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=721680.0, ans=0.125 2023-12-22 18:27:26,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721746.6666666666, ans=0.0 2023-12-22 18:27:27,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=721746.6666666666, ans=0.125 2023-12-22 18:27:31,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-12-22 18:27:42,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=15.0 2023-12-22 18:27:57,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721946.6666666666, ans=0.1 2023-12-22 18:28:04,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=721946.6666666666, ans=0.125 2023-12-22 18:28:06,898 INFO [train.py:886] (3/4) Epoch 23, batch 3450, loss[loss=0.01398, audio_tagging_loss=0.01398, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4947857.50 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:28:27,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-22 18:28:45,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722213.3333333334, ans=0.125 2023-12-22 18:28:46,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=722280.0, ans=0.125 2023-12-22 18:28:47,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722280.0, ans=0.125 2023-12-22 18:28:53,917 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+01 3.061e+01 3.208e+01 3.330e+01 3.695e+01, threshold=6.416e+01, percent-clipped=0.0 2023-12-22 18:28:58,436 INFO [train.py:886] (3/4) Epoch 23, batch 3500, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4940114.54 frames. ], batch size: 99, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:28:58,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=722346.6666666666, ans=0.05 2023-12-22 18:29:00,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-12-22 18:29:05,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=722346.6666666666, ans=0.125 2023-12-22 18:29:07,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722413.3333333334, ans=0.1 2023-12-22 18:29:16,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=722413.3333333334, ans=0.035 2023-12-22 18:29:21,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722480.0, ans=0.125 2023-12-22 18:29:28,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-12-22 18:29:34,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=722546.6666666666, ans=0.05 2023-12-22 18:29:35,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2023-12-22 18:29:39,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=722613.3333333334, ans=0.125 2023-12-22 18:29:41,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=722613.3333333334, ans=0.0 2023-12-22 18:29:48,879 INFO [train.py:886] (3/4) Epoch 23, batch 3550, loss[loss=0.01384, audio_tagging_loss=0.01384, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4936220.00 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:29:55,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=722680.0, ans=0.125 2023-12-22 18:29:57,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722680.0, ans=0.1 2023-12-22 18:30:03,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=22.5 2023-12-22 18:30:14,559 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:30:33,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-12-22 18:30:36,386 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.627e+01 2.997e+01 3.123e+01 3.295e+01 3.646e+01, threshold=6.246e+01, percent-clipped=0.0 2023-12-22 18:30:40,162 INFO [train.py:886] (3/4) Epoch 23, batch 3600, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4940212.99 frames. ], batch size: 100, lr: 4.69e-03, grad_scale: 64.0 2023-12-22 18:30:43,185 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:30:43,238 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:30:50,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-12-22 18:31:07,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723146.6666666666, ans=0.125 2023-12-22 18:31:21,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=723280.0, ans=0.0 2023-12-22 18:31:31,388 INFO [train.py:886] (3/4) Epoch 23, batch 3650, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4944524.51 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:31:31,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=723346.6666666666, ans=0.0 2023-12-22 18:31:33,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=723346.6666666666, ans=0.0 2023-12-22 18:31:45,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=723413.3333333334, ans=0.0 2023-12-22 18:32:10,976 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:32:20,213 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+01 2.967e+01 3.120e+01 3.208e+01 3.587e+01, threshold=6.240e+01, percent-clipped=0.0 2023-12-22 18:32:23,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723680.0, ans=0.125 2023-12-22 18:32:24,016 INFO [train.py:886] (3/4) Epoch 23, batch 3700, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4953308.44 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:32:30,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=723680.0, ans=0.125 2023-12-22 18:32:35,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=723746.6666666666, ans=0.1 2023-12-22 18:32:48,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=723813.3333333334, ans=0.0 2023-12-22 18:32:51,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=723813.3333333334, ans=0.125 2023-12-22 18:32:51,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=723813.3333333334, ans=0.0 2023-12-22 18:33:05,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=723946.6666666666, ans=0.125 2023-12-22 18:33:15,979 INFO [train.py:886] (3/4) Epoch 23, batch 3750, loss[loss=0.01391, audio_tagging_loss=0.01391, over 21985.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4943718.33 frames. ], batch size: 107, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:33:18,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-12-22 18:33:21,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=724013.3333333334, ans=0.125 2023-12-22 18:33:22,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2023-12-22 18:33:24,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-12-22 18:33:31,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=724080.0, ans=0.125 2023-12-22 18:33:32,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=724080.0, ans=0.0 2023-12-22 18:33:39,635 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:33:52,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-12-22 18:33:53,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-12-22 18:34:01,962 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.057e+01 3.159e+01 3.344e+01 3.975e+01, threshold=6.319e+01, percent-clipped=0.0 2023-12-22 18:34:04,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=724280.0, ans=0.125 2023-12-22 18:34:05,795 INFO [train.py:886] (3/4) Epoch 23, batch 3800, loss[loss=0.01527, audio_tagging_loss=0.01527, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4939699.72 frames. ], batch size: 99, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:34:22,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=724413.3333333334, ans=0.02 2023-12-22 18:34:32,705 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.621e-03 2023-12-22 18:34:38,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=724546.6666666666, ans=0.125 2023-12-22 18:34:43,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2023-12-22 18:34:57,462 INFO [train.py:886] (3/4) Epoch 23, batch 3850, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24060.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4941166.49 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:35:24,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2023-12-22 18:35:43,188 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.009e+01 3.160e+01 3.363e+01 3.949e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 18:35:48,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=725013.3333333334, ans=0.0 2023-12-22 18:35:49,155 INFO [train.py:886] (3/4) Epoch 23, batch 3900, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945471.45 frames. ], batch size: 99, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:35:59,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725080.0, ans=0.125 2023-12-22 18:36:00,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=725080.0, ans=0.0 2023-12-22 18:36:01,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=725080.0, ans=0.0 2023-12-22 18:36:11,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725146.6666666666, ans=0.1 2023-12-22 18:36:20,029 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:36:27,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=725213.3333333334, ans=0.025 2023-12-22 18:36:31,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=725280.0, ans=0.125 2023-12-22 18:36:32,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725280.0, ans=0.1 2023-12-22 18:36:40,541 INFO [train.py:886] (3/4) Epoch 23, batch 3950, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4943969.27 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 128.0 2023-12-22 18:37:07,121 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:37:14,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=725546.6666666666, ans=0.0 2023-12-22 18:37:20,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=725613.3333333334, ans=0.2 2023-12-22 18:37:21,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=725613.3333333334, ans=0.0 2023-12-22 18:37:25,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=725613.3333333334, ans=0.125 2023-12-22 18:37:26,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=725613.3333333334, ans=0.2 2023-12-22 18:37:30,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+01 2.967e+01 3.138e+01 3.271e+01 3.711e+01, threshold=6.275e+01, percent-clipped=0.0 2023-12-22 18:37:33,022 INFO [train.py:886] (3/4) Epoch 23, batch 4000, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4949048.22 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:37:35,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=725680.0, ans=0.2 2023-12-22 18:37:40,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=725680.0, ans=0.125 2023-12-22 18:38:03,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2023-12-22 18:38:14,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725946.6666666666, ans=0.125 2023-12-22 18:38:23,318 INFO [train.py:886] (3/4) Epoch 23, batch 4050, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4950894.01 frames. ], batch size: 100, lr: 4.68e-03, grad_scale: 64.0 2023-12-22 18:38:23,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-12-22 18:38:28,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-22 18:38:32,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=726013.3333333334, ans=0.125 2023-12-22 18:38:37,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-22 18:38:38,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=726080.0, ans=0.125 2023-12-22 18:38:55,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=726213.3333333334, ans=0.125 2023-12-22 18:39:13,489 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.032e+01 3.162e+01 3.330e+01 3.866e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:39:16,368 INFO [train.py:886] (3/4) Epoch 23, batch 4100, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4940031.88 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:39:16,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=726346.6666666666, ans=0.0 2023-12-22 18:39:24,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726346.6666666666, ans=0.1 2023-12-22 18:39:32,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-12-22 18:39:48,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726546.6666666666, ans=0.125 2023-12-22 18:39:58,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.21 vs. limit=12.0 2023-12-22 18:40:04,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=726613.3333333334, ans=0.025 2023-12-22 18:40:08,616 INFO [train.py:886] (3/4) Epoch 23, batch 4150, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4939891.22 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:40:14,217 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:40:16,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=15.0 2023-12-22 18:40:20,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=726746.6666666666, ans=15.0 2023-12-22 18:40:22,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-12-22 18:40:25,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=726746.6666666666, ans=0.125 2023-12-22 18:40:41,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=726880.0, ans=0.0 2023-12-22 18:40:46,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-12-22 18:40:56,283 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.726e+01 3.004e+01 3.122e+01 3.283e+01 3.797e+01, threshold=6.243e+01, percent-clipped=0.0 2023-12-22 18:40:59,165 INFO [train.py:886] (3/4) Epoch 23, batch 4200, loss[loss=0.01396, audio_tagging_loss=0.01396, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4942987.82 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:05,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=727013.3333333334, ans=0.0 2023-12-22 18:41:25,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727146.6666666666, ans=0.1 2023-12-22 18:41:25,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-22 18:41:43,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=727280.0, ans=0.0 2023-12-22 18:41:52,128 INFO [train.py:886] (3/4) Epoch 23, batch 4250, loss[loss=0.01522, audio_tagging_loss=0.01522, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4943371.40 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:41:58,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=727346.6666666666, ans=0.125 2023-12-22 18:42:08,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2023-12-22 18:42:08,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2023-12-22 18:42:39,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=727613.3333333334, ans=0.0 2023-12-22 18:42:39,986 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.028e+01 3.146e+01 3.307e+01 3.959e+01, threshold=6.292e+01, percent-clipped=0.0 2023-12-22 18:42:41,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=727613.3333333334, ans=0.125 2023-12-22 18:42:43,633 INFO [train.py:886] (3/4) Epoch 23, batch 4300, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4941778.26 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:42:43,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=727680.0, ans=0.2 2023-12-22 18:42:50,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=727680.0, ans=0.025 2023-12-22 18:42:51,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=727680.0, ans=0.125 2023-12-22 18:43:01,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727746.6666666666, ans=0.125 2023-12-22 18:43:06,909 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:43:25,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.40 vs. limit=22.5 2023-12-22 18:43:25,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=727946.6666666666, ans=0.0 2023-12-22 18:43:28,579 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:43:35,868 INFO [train.py:886] (3/4) Epoch 23, batch 4350, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4945182.40 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:43:39,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2023-12-22 18:43:39,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=728013.3333333334, ans=0.0 2023-12-22 18:43:43,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=728013.3333333334, ans=0.125 2023-12-22 18:43:55,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=728146.6666666666, ans=0.0 2023-12-22 18:44:24,888 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.796e+01 3.084e+01 3.223e+01 3.361e+01 3.747e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 18:44:27,820 INFO [train.py:886] (3/4) Epoch 23, batch 4400, loss[loss=0.013, audio_tagging_loss=0.013, over 24002.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4944483.38 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:44:36,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2023-12-22 18:44:58,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=728546.6666666666, ans=0.05 2023-12-22 18:45:08,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=728613.3333333334, ans=0.2 2023-12-22 18:45:12,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=728613.3333333334, ans=0.125 2023-12-22 18:45:18,738 INFO [train.py:886] (3/4) Epoch 23, batch 4450, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4948602.28 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:45:56,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=728880.0, ans=0.0 2023-12-22 18:45:58,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728880.0, ans=0.1 2023-12-22 18:46:04,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=728946.6666666666, ans=0.035 2023-12-22 18:46:08,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.762e+01 3.007e+01 3.166e+01 3.319e+01 4.167e+01, threshold=6.332e+01, percent-clipped=0.0 2023-12-22 18:46:10,959 INFO [train.py:886] (3/4) Epoch 23, batch 4500, loss[loss=0.01428, audio_tagging_loss=0.01428, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4950261.55 frames. ], batch size: 99, lr: 4.67e-03, grad_scale: 64.0 2023-12-22 18:46:23,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-12-22 18:46:25,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=729080.0, ans=0.2 2023-12-22 18:46:27,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2023-12-22 18:46:27,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=729080.0, ans=0.125 2023-12-22 18:46:28,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=729080.0, ans=0.07 2023-12-22 18:46:33,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-12-22 18:46:40,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=729146.6666666666, ans=0.025 2023-12-22 18:46:47,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729213.3333333334, ans=0.125 2023-12-22 18:46:55,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-12-22 18:47:03,345 INFO [train.py:886] (3/4) Epoch 23, batch 4550, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4951032.09 frames. ], batch size: 100, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:47:11,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=729346.6666666666, ans=0.125 2023-12-22 18:47:27,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=729480.0, ans=0.125 2023-12-22 18:47:33,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=729546.6666666666, ans=0.125 2023-12-22 18:47:39,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-12-22 18:47:47,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=729613.3333333334, ans=0.125 2023-12-22 18:47:51,141 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.681e+01 2.974e+01 3.118e+01 3.247e+01 3.951e+01, threshold=6.237e+01, percent-clipped=0.0 2023-12-22 18:47:54,734 INFO [train.py:886] (3/4) Epoch 23, batch 4600, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4957410.13 frames. ], batch size: 100, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:47:58,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=729680.0, ans=0.0 2023-12-22 18:48:05,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=729746.6666666666, ans=0.125 2023-12-22 18:48:14,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=729813.3333333334, ans=0.125 2023-12-22 18:48:15,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=729813.3333333334, ans=0.1 2023-12-22 18:48:20,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729813.3333333334, ans=0.1 2023-12-22 18:48:46,816 INFO [train.py:886] (3/4) Epoch 23, batch 4650, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4962995.45 frames. ], batch size: 100, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:48:51,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=730013.3333333334, ans=0.125 2023-12-22 18:49:03,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=730080.0, ans=0.125 2023-12-22 18:49:27,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730280.0, ans=0.1 2023-12-22 18:49:31,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-12-22 18:49:33,716 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+01 3.050e+01 3.178e+01 3.300e+01 3.735e+01, threshold=6.356e+01, percent-clipped=0.0 2023-12-22 18:49:33,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=730280.0, ans=0.0 2023-12-22 18:49:36,484 INFO [train.py:886] (3/4) Epoch 23, batch 4700, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4955006.28 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:49:45,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=730413.3333333334, ans=0.125 2023-12-22 18:49:58,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730480.0, ans=0.1 2023-12-22 18:50:03,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=730480.0, ans=0.125 2023-12-22 18:50:03,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=730480.0, ans=0.0 2023-12-22 18:50:09,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2023-12-22 18:50:10,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=730546.6666666666, ans=0.125 2023-12-22 18:50:12,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730546.6666666666, ans=0.1 2023-12-22 18:50:14,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730613.3333333334, ans=0.1 2023-12-22 18:50:20,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=730613.3333333334, ans=0.05 2023-12-22 18:50:24,710 INFO [train.py:886] (3/4) Epoch 23, batch 4750, loss[loss=0.01472, audio_tagging_loss=0.01472, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4955157.83 frames. ], batch size: 99, lr: 4.66e-03, grad_scale: 64.0 2023-12-22 18:50:58,839 INFO [train.py:886] (3/4) Epoch 24, batch 0, loss[loss=0.02604, audio_tagging_loss=0.02604, over 25000.00 frames. ], tot_loss[loss=0.02604, audio_tagging_loss=0.02604, over 25000.00 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:50:58,840 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 18:51:19,321 INFO [train.py:917] (3/4) Epoch 24, validation: loss=0.03237, audio_tagging_loss=0.03237, over 3737520.00 frames. 2023-12-22 18:51:19,321 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 18:51:35,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=730853.3333333334, ans=0.125 2023-12-22 18:51:37,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730853.3333333334, ans=0.125 2023-12-22 18:51:51,195 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.132e+01 3.325e+01 4.669e+01 9.691e+01, threshold=6.651e+01, percent-clipped=7.0 2023-12-22 18:51:57,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730986.6666666666, ans=0.1 2023-12-22 18:51:58,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=731053.3333333334, ans=0.0 2023-12-22 18:52:10,593 INFO [train.py:886] (3/4) Epoch 24, batch 50, loss[loss=0.01591, audio_tagging_loss=0.01591, over 25000.00 frames. ], tot_loss[loss=0.02074, audio_tagging_loss=0.02074, over 1118267.29 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:52:14,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=731120.0, ans=0.0 2023-12-22 18:52:26,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731186.6666666666, ans=0.1 2023-12-22 18:52:26,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=731186.6666666666, ans=0.0 2023-12-22 18:52:51,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=731386.6666666666, ans=0.05 2023-12-22 18:53:02,005 INFO [train.py:886] (3/4) Epoch 24, batch 100, loss[loss=0.01493, audio_tagging_loss=0.01493, over 25000.00 frames. ], tot_loss[loss=0.01793, audio_tagging_loss=0.01793, over 1969651.55 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:53:16,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731520.0, ans=0.1 2023-12-22 18:53:34,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.856e+01 3.361e+01 3.546e+01 3.763e+01 4.764e+01, threshold=7.093e+01, percent-clipped=0.0 2023-12-22 18:53:37,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=731653.3333333334, ans=10.0 2023-12-22 18:53:39,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731653.3333333334, ans=0.125 2023-12-22 18:53:53,539 INFO [train.py:886] (3/4) Epoch 24, batch 150, loss[loss=0.01534, audio_tagging_loss=0.01534, over 25000.00 frames. ], tot_loss[loss=0.01639, audio_tagging_loss=0.01639, over 2638869.43 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:54:31,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-12-22 18:54:32,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=731986.6666666666, ans=0.125 2023-12-22 18:54:42,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=732053.3333333334, ans=12.0 2023-12-22 18:54:44,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732120.0, ans=0.1 2023-12-22 18:54:45,370 INFO [train.py:886] (3/4) Epoch 24, batch 200, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 3154035.14 frames. ], batch size: 100, lr: 4.56e-03, grad_scale: 64.0 2023-12-22 18:54:47,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=732120.0, ans=0.125 2023-12-22 18:54:56,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732186.6666666666, ans=0.1 2023-12-22 18:54:58,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=732186.6666666666, ans=0.125 2023-12-22 18:55:03,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-12-22 18:55:17,472 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.030e+01 3.161e+01 3.286e+01 3.848e+01, threshold=6.323e+01, percent-clipped=0.0 2023-12-22 18:55:30,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=732386.6666666666, ans=0.125 2023-12-22 18:55:38,565 INFO [train.py:886] (3/4) Epoch 24, batch 250, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01482, audio_tagging_loss=0.01482, over 3554483.33 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:55:49,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=732520.0, ans=0.125 2023-12-22 18:56:00,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=732586.6666666666, ans=0.0 2023-12-22 18:56:01,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.48 vs. limit=5.0 2023-12-22 18:56:09,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=732653.3333333334, ans=0.125 2023-12-22 18:56:30,534 INFO [train.py:886] (3/4) Epoch 24, batch 300, loss[loss=0.01727, audio_tagging_loss=0.01727, over 24950.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 3865822.62 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:56:30,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=732786.6666666666, ans=0.125 2023-12-22 18:56:47,487 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 18:56:54,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2023-12-22 18:57:02,323 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.035e+01 3.194e+01 3.337e+01 3.823e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 18:57:02,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=732986.6666666666, ans=0.125 2023-12-22 18:57:06,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=732986.6666666666, ans=0.125 2023-12-22 18:57:15,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=733053.3333333334, ans=0.1 2023-12-22 18:57:20,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733053.3333333334, ans=0.125 2023-12-22 18:57:21,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733120.0, ans=0.1 2023-12-22 18:57:21,942 INFO [train.py:886] (3/4) Epoch 24, batch 350, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24750.00 frames. ], tot_loss[loss=0.01433, audio_tagging_loss=0.01433, over 4104494.97 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:57:41,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733186.6666666666, ans=0.1 2023-12-22 18:57:48,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733253.3333333334, ans=0.125 2023-12-22 18:57:51,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733253.3333333334, ans=0.125 2023-12-22 18:57:55,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=733320.0, ans=0.07 2023-12-22 18:58:15,024 INFO [train.py:886] (3/4) Epoch 24, batch 400, loss[loss=0.009917, audio_tagging_loss=0.009917, over 25000.00 frames. ], tot_loss[loss=0.01396, audio_tagging_loss=0.01396, over 4293802.04 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:58:25,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=733520.0, ans=0.125 2023-12-22 18:58:27,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=733520.0, ans=0.0 2023-12-22 18:58:29,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=733520.0, ans=0.125 2023-12-22 18:58:34,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=733586.6666666666, ans=0.2 2023-12-22 18:58:42,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=733586.6666666666, ans=0.125 2023-12-22 18:58:47,566 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+01 2.966e+01 3.157e+01 3.316e+01 3.738e+01, threshold=6.314e+01, percent-clipped=0.0 2023-12-22 18:59:01,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=733720.0, ans=0.2 2023-12-22 18:59:05,908 INFO [train.py:886] (3/4) Epoch 24, batch 450, loss[loss=0.01517, audio_tagging_loss=0.01517, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 4439289.79 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 18:59:54,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=8.0 2023-12-22 18:59:59,014 INFO [train.py:886] (3/4) Epoch 24, batch 500, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 4553719.99 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:27,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734253.3333333334, ans=0.125 2023-12-22 19:00:29,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=734320.0, ans=0.2 2023-12-22 19:00:31,463 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.996e+01 3.110e+01 3.243e+01 4.536e+01, threshold=6.220e+01, percent-clipped=0.0 2023-12-22 19:00:33,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=734320.0, ans=0.0 2023-12-22 19:00:40,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=734386.6666666666, ans=0.2 2023-12-22 19:00:40,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-12-22 19:00:50,157 INFO [train.py:886] (3/4) Epoch 24, batch 550, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4635394.57 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:00:59,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=734453.3333333334, ans=0.125 2023-12-22 19:01:00,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734520.0, ans=0.1 2023-12-22 19:01:03,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734520.0, ans=0.125 2023-12-22 19:01:13,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=734586.6666666666, ans=0.2 2023-12-22 19:01:34,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2023-12-22 19:01:42,432 INFO [train.py:886] (3/4) Epoch 24, batch 600, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4701737.24 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:01:51,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734853.3333333334, ans=0.125 2023-12-22 19:01:53,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=734853.3333333334, ans=0.05 2023-12-22 19:02:00,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734853.3333333334, ans=0.125 2023-12-22 19:02:13,847 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.711e+01 3.028e+01 3.193e+01 3.328e+01 3.819e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 19:02:14,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734986.6666666666, ans=0.125 2023-12-22 19:02:24,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2023-12-22 19:02:26,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-22 19:02:33,923 INFO [train.py:886] (3/4) Epoch 24, batch 650, loss[loss=0.01407, audio_tagging_loss=0.01407, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4750359.98 frames. ], batch size: 99, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:02:34,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735120.0, ans=0.1 2023-12-22 19:02:38,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=735120.0, ans=0.0 2023-12-22 19:02:47,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-12-22 19:02:55,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-12-22 19:03:21,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=735386.6666666666, ans=0.04949747468305833 2023-12-22 19:03:24,384 INFO [train.py:886] (3/4) Epoch 24, batch 700, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4795313.80 frames. ], batch size: 100, lr: 4.55e-03, grad_scale: 64.0 2023-12-22 19:03:27,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-12-22 19:03:28,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735453.3333333334, ans=0.1 2023-12-22 19:03:38,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2023-12-22 19:03:50,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=735586.6666666666, ans=0.125 2023-12-22 19:03:55,821 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.626e+01 3.024e+01 3.156e+01 3.315e+01 3.612e+01, threshold=6.313e+01, percent-clipped=0.0 2023-12-22 19:04:14,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=735786.6666666666, ans=0.2 2023-12-22 19:04:15,531 INFO [train.py:886] (3/4) Epoch 24, batch 750, loss[loss=0.01071, audio_tagging_loss=0.01071, over 22117.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4829429.04 frames. ], batch size: 107, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:04:22,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=735786.6666666666, ans=0.125 2023-12-22 19:04:28,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-22 19:04:51,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=735986.6666666666, ans=0.125 2023-12-22 19:04:57,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=736053.3333333334, ans=0.125 2023-12-22 19:05:02,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=736053.3333333334, ans=0.2 2023-12-22 19:05:05,927 INFO [train.py:886] (3/4) Epoch 24, batch 800, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4859296.81 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:05:06,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=736120.0, ans=0.07 2023-12-22 19:05:18,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=736186.6666666666, ans=0.0 2023-12-22 19:05:31,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=736253.3333333334, ans=0.0 2023-12-22 19:05:36,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=736320.0, ans=0.2 2023-12-22 19:05:38,412 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+01 2.955e+01 3.119e+01 3.236e+01 3.787e+01, threshold=6.239e+01, percent-clipped=0.0 2023-12-22 19:05:42,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=736320.0, ans=0.2 2023-12-22 19:05:51,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=736386.6666666666, ans=0.0 2023-12-22 19:05:53,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=736386.6666666666, ans=0.125 2023-12-22 19:05:58,644 INFO [train.py:886] (3/4) Epoch 24, batch 850, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4882755.51 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:05:58,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=736453.3333333334, ans=0.0 2023-12-22 19:06:01,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=736453.3333333334, ans=0.125 2023-12-22 19:06:18,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=736520.0, ans=0.2 2023-12-22 19:06:20,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=736586.6666666666, ans=0.0 2023-12-22 19:06:25,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=736586.6666666666, ans=0.125 2023-12-22 19:06:32,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736653.3333333334, ans=0.0 2023-12-22 19:06:37,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-12-22 19:06:41,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736720.0, ans=0.1 2023-12-22 19:06:42,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=736720.0, ans=0.125 2023-12-22 19:06:50,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=736786.6666666666, ans=0.125 2023-12-22 19:06:50,819 INFO [train.py:886] (3/4) Epoch 24, batch 900, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4890275.29 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:06:59,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-12-22 19:07:12,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=736920.0, ans=0.125 2023-12-22 19:07:22,869 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.064e+01 3.189e+01 3.308e+01 4.129e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 19:07:25,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-12-22 19:07:30,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=736986.6666666666, ans=0.125 2023-12-22 19:07:37,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=737053.3333333334, ans=0.0 2023-12-22 19:07:42,178 INFO [train.py:886] (3/4) Epoch 24, batch 950, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4900235.74 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:07:43,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2023-12-22 19:07:44,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=737120.0, ans=0.125 2023-12-22 19:08:17,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-12-22 19:08:33,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=737386.6666666666, ans=0.0 2023-12-22 19:08:34,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.363e-02 2023-12-22 19:08:34,718 INFO [train.py:886] (3/4) Epoch 24, batch 1000, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4910475.91 frames. ], batch size: 99, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:08:39,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=737453.3333333334, ans=0.125 2023-12-22 19:09:06,969 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.839e+01 3.078e+01 3.149e+01 3.357e+01 3.682e+01, threshold=6.299e+01, percent-clipped=0.0 2023-12-22 19:09:13,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=12.0 2023-12-22 19:09:27,361 INFO [train.py:886] (3/4) Epoch 24, batch 1050, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4916570.28 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:09:45,413 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:10:01,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737986.6666666666, ans=0.1 2023-12-22 19:10:03,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=737986.6666666666, ans=0.2 2023-12-22 19:10:09,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=738053.3333333334, ans=0.125 2023-12-22 19:10:10,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=738053.3333333334, ans=10.0 2023-12-22 19:10:12,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-12-22 19:10:18,274 INFO [train.py:886] (3/4) Epoch 24, batch 1100, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4923222.64 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:10:19,427 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:10:23,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=738120.0, ans=0.2 2023-12-22 19:10:24,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738120.0, ans=0.125 2023-12-22 19:10:50,308 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 2.982e+01 3.129e+01 3.253e+01 3.544e+01, threshold=6.257e+01, percent-clipped=0.0 2023-12-22 19:10:53,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=738320.0, ans=0.0 2023-12-22 19:11:09,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=738453.3333333334, ans=0.125 2023-12-22 19:11:10,318 INFO [train.py:886] (3/4) Epoch 24, batch 1150, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4929034.85 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 64.0 2023-12-22 19:11:19,853 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:11:32,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.73 vs. limit=15.0 2023-12-22 19:11:42,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 19:11:44,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=738653.3333333334, ans=0.125 2023-12-22 19:11:45,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=738653.3333333334, ans=0.125 2023-12-22 19:12:02,048 INFO [train.py:886] (3/4) Epoch 24, batch 1200, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4940333.33 frames. ], batch size: 100, lr: 4.54e-03, grad_scale: 128.0 2023-12-22 19:12:20,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=738853.3333333334, ans=0.125 2023-12-22 19:12:25,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738920.0, ans=0.1 2023-12-22 19:12:34,314 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.042e+01 3.211e+01 3.340e+01 3.947e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 19:12:47,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739053.3333333334, ans=0.1 2023-12-22 19:12:48,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-12-22 19:12:54,868 INFO [train.py:886] (3/4) Epoch 24, batch 1250, loss[loss=0.01174, audio_tagging_loss=0.01174, over 22254.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4939317.64 frames. ], batch size: 107, lr: 4.53e-03, grad_scale: 128.0 2023-12-22 19:12:58,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739120.0, ans=0.125 2023-12-22 19:13:05,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2023-12-22 19:13:12,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=739186.6666666666, ans=0.125 2023-12-22 19:13:17,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=739253.3333333334, ans=0.125 2023-12-22 19:13:28,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=739320.0, ans=0.0 2023-12-22 19:13:35,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-22 19:13:47,181 INFO [train.py:886] (3/4) Epoch 24, batch 1300, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4940314.17 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:13:49,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=739453.3333333334, ans=0.125 2023-12-22 19:13:56,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-22 19:14:20,276 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.066e+01 3.219e+01 3.358e+01 3.789e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 19:14:38,046 INFO [train.py:886] (3/4) Epoch 24, batch 1350, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4945225.51 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:14:59,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739920.0, ans=0.125 2023-12-22 19:15:08,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739986.6666666666, ans=0.125 2023-12-22 19:15:12,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=739986.6666666666, ans=0.0 2023-12-22 19:15:30,559 INFO [train.py:886] (3/4) Epoch 24, batch 1400, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4944425.79 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:15:33,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740120.0, ans=0.1 2023-12-22 19:15:33,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740120.0, ans=0.1 2023-12-22 19:15:36,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.27 vs. limit=10.0 2023-12-22 19:15:44,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740186.6666666666, ans=0.0 2023-12-22 19:15:53,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=740253.3333333334, ans=0.025 2023-12-22 19:15:59,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740253.3333333334, ans=0.125 2023-12-22 19:16:03,679 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 2.993e+01 3.155e+01 3.299e+01 3.866e+01, threshold=6.310e+01, percent-clipped=0.0 2023-12-22 19:16:14,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=740386.6666666666, ans=0.2 2023-12-22 19:16:20,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=740386.6666666666, ans=0.0 2023-12-22 19:16:22,192 INFO [train.py:886] (3/4) Epoch 24, batch 1450, loss[loss=0.01592, audio_tagging_loss=0.01592, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4947201.38 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:16:43,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-22 19:16:46,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=740586.6666666666, ans=0.2 2023-12-22 19:16:55,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-12-22 19:17:13,101 INFO [train.py:886] (3/4) Epoch 24, batch 1500, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4956167.60 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:17:19,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=740786.6666666666, ans=0.125 2023-12-22 19:17:19,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-12-22 19:17:26,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=740853.3333333334, ans=0.0 2023-12-22 19:17:37,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=740920.0, ans=0.125 2023-12-22 19:17:46,190 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.063e+01 3.189e+01 3.318e+01 3.904e+01, threshold=6.378e+01, percent-clipped=0.0 2023-12-22 19:17:56,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=741053.3333333334, ans=0.1 2023-12-22 19:18:02,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=741053.3333333334, ans=0.05 2023-12-22 19:18:05,231 INFO [train.py:886] (3/4) Epoch 24, batch 1550, loss[loss=0.01446, audio_tagging_loss=0.01446, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4949824.32 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:18:09,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2023-12-22 19:18:10,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=741120.0, ans=0.2 2023-12-22 19:18:15,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=741186.6666666666, ans=0.125 2023-12-22 19:18:16,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-22 19:18:37,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=741320.0, ans=0.125 2023-12-22 19:18:44,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=741320.0, ans=0.125 2023-12-22 19:18:45,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-12-22 19:18:56,365 INFO [train.py:886] (3/4) Epoch 24, batch 1600, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 4951602.83 frames. ], batch size: 99, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:19:02,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=741453.3333333334, ans=0.125 2023-12-22 19:19:04,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-12-22 19:19:28,584 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.060e+01 3.214e+01 3.363e+01 3.973e+01, threshold=6.429e+01, percent-clipped=0.0 2023-12-22 19:19:46,319 INFO [train.py:886] (3/4) Epoch 24, batch 1650, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4945564.88 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=741920.0, ans=0.2 2023-12-22 19:20:21,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-12-22 19:20:27,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=742053.3333333334, ans=0.2 2023-12-22 19:20:34,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2023-12-22 19:20:36,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=742053.3333333334, ans=0.025 2023-12-22 19:20:36,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2023-12-22 19:20:39,646 INFO [train.py:886] (3/4) Epoch 24, batch 1700, loss[loss=0.01437, audio_tagging_loss=0.01437, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4949565.13 frames. ], batch size: 100, lr: 4.53e-03, grad_scale: 64.0 2023-12-22 19:20:49,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-12-22 19:21:12,332 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.769e+01 2.988e+01 3.129e+01 3.264e+01 3.998e+01, threshold=6.258e+01, percent-clipped=0.0 2023-12-22 19:21:15,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=742320.0, ans=0.125 2023-12-22 19:21:19,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-12-22 19:21:29,581 INFO [train.py:886] (3/4) Epoch 24, batch 1750, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4953595.96 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:21:38,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742453.3333333334, ans=0.125 2023-12-22 19:21:54,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=742586.6666666666, ans=15.0 2023-12-22 19:22:13,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-12-22 19:22:22,566 INFO [train.py:886] (3/4) Epoch 24, batch 1800, loss[loss=0.01025, audio_tagging_loss=0.01025, over 24069.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4959187.51 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:22:31,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742853.3333333334, ans=0.125 2023-12-22 19:22:35,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=742853.3333333334, ans=0.0 2023-12-22 19:22:35,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=742853.3333333334, ans=0.0 2023-12-22 19:22:45,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742920.0, ans=0.0 2023-12-22 19:22:55,019 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.072e+01 3.188e+01 3.328e+01 3.715e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 19:22:57,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-22 19:23:00,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=742986.6666666666, ans=0.5 2023-12-22 19:23:01,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742986.6666666666, ans=0.1 2023-12-22 19:23:10,895 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:23:12,661 INFO [train.py:886] (3/4) Epoch 24, batch 1850, loss[loss=0.0172, audio_tagging_loss=0.0172, over 24956.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4960022.12 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:23:21,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=743120.0, ans=0.95 2023-12-22 19:23:37,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=743253.3333333334, ans=0.05 2023-12-22 19:23:38,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=743253.3333333334, ans=0.125 2023-12-22 19:23:45,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=743320.0, ans=0.125 2023-12-22 19:23:53,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=743386.6666666666, ans=0.0 2023-12-22 19:24:03,144 INFO [train.py:886] (3/4) Epoch 24, batch 1900, loss[loss=0.01524, audio_tagging_loss=0.01524, over 24927.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4954275.02 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:08,003 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:24:29,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=12.0 2023-12-22 19:24:35,409 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.079e+01 3.221e+01 3.350e+01 3.840e+01, threshold=6.442e+01, percent-clipped=0.0 2023-12-22 19:24:43,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743720.0, ans=0.1 2023-12-22 19:24:54,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743786.6666666666, ans=0.125 2023-12-22 19:24:55,067 INFO [train.py:886] (3/4) Epoch 24, batch 1950, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 4951077.11 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:24:59,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743786.6666666666, ans=0.125 2023-12-22 19:25:13,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743920.0, ans=0.125 2023-12-22 19:25:14,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=743920.0, ans=0.125 2023-12-22 19:25:30,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=743986.6666666666, ans=0.0 2023-12-22 19:25:44,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2023-12-22 19:25:45,863 INFO [train.py:886] (3/4) Epoch 24, batch 2000, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4951583.00 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:25:47,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-12-22 19:26:06,938 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:26:08,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=744253.3333333334, ans=0.07 2023-12-22 19:26:09,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2023-12-22 19:26:13,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.75 vs. limit=15.0 2023-12-22 19:26:14,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744253.3333333334, ans=0.1 2023-12-22 19:26:16,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=744320.0, ans=0.0 2023-12-22 19:26:18,953 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.621e+01 2.994e+01 3.122e+01 3.277e+01 4.126e+01, threshold=6.244e+01, percent-clipped=0.0 2023-12-22 19:26:19,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=744320.0, ans=0.07 2023-12-22 19:26:26,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=744386.6666666666, ans=0.0 2023-12-22 19:26:33,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744386.6666666666, ans=0.1 2023-12-22 19:26:38,097 INFO [train.py:886] (3/4) Epoch 24, batch 2050, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4951058.77 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:27:11,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744653.3333333334, ans=0.1 2023-12-22 19:27:20,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=744720.0, ans=0.125 2023-12-22 19:27:26,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=744720.0, ans=0.0 2023-12-22 19:27:28,902 INFO [train.py:886] (3/4) Epoch 24, batch 2100, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4954552.81 frames. ], batch size: 100, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:27:34,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=744786.6666666666, ans=0.0 2023-12-22 19:27:41,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.51 vs. limit=5.0 2023-12-22 19:27:45,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-12-22 19:27:52,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2023-12-22 19:27:53,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-12-22 19:27:55,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744920.0, ans=0.1 2023-12-22 19:27:58,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=744920.0, ans=0.125 2023-12-22 19:28:02,089 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.004e+01 3.137e+01 3.300e+01 3.808e+01, threshold=6.274e+01, percent-clipped=0.0 2023-12-22 19:28:09,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=744986.6666666666, ans=0.125 2023-12-22 19:28:21,335 INFO [train.py:886] (3/4) Epoch 24, batch 2150, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4956369.84 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:28:45,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-12-22 19:28:52,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=745320.0, ans=0.125 2023-12-22 19:28:56,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=745320.0, ans=0.0 2023-12-22 19:29:03,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=745386.6666666666, ans=0.2 2023-12-22 19:29:13,598 INFO [train.py:886] (3/4) Epoch 24, batch 2200, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4953051.16 frames. ], batch size: 99, lr: 4.52e-03, grad_scale: 64.0 2023-12-22 19:29:17,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=745453.3333333334, ans=0.0 2023-12-22 19:29:36,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=745586.6666666666, ans=0.2 2023-12-22 19:29:38,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=745586.6666666666, ans=0.125 2023-12-22 19:29:46,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.589e+01 3.066e+01 3.183e+01 3.365e+01 3.931e+01, threshold=6.367e+01, percent-clipped=0.0 2023-12-22 19:29:46,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=745653.3333333334, ans=0.0 2023-12-22 19:30:04,872 INFO [train.py:886] (3/4) Epoch 24, batch 2250, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4946588.43 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:30:20,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=745853.3333333334, ans=0.125 2023-12-22 19:30:26,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=745920.0, ans=0.1 2023-12-22 19:30:27,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=12.0 2023-12-22 19:30:45,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=745986.6666666666, ans=0.125 2023-12-22 19:30:56,858 INFO [train.py:886] (3/4) Epoch 24, batch 2300, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4944333.53 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:31:06,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746186.6666666666, ans=0.1 2023-12-22 19:31:07,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.54 vs. limit=5.0 2023-12-22 19:31:14,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-12-22 19:31:29,087 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 2.999e+01 3.171e+01 3.296e+01 4.791e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:31:37,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746386.6666666666, ans=0.1 2023-12-22 19:31:46,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-12-22 19:31:48,238 INFO [train.py:886] (3/4) Epoch 24, batch 2350, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4945096.85 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:32:29,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=746653.3333333334, ans=0.2 2023-12-22 19:32:30,887 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.051e-02 2023-12-22 19:32:41,818 INFO [train.py:886] (3/4) Epoch 24, batch 2400, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4946374.56 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:32:42,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-12-22 19:32:55,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=746853.3333333334, ans=10.0 2023-12-22 19:33:03,037 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:33:04,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=746920.0, ans=0.2 2023-12-22 19:33:14,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-12-22 19:33:14,824 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.682e+01 2.999e+01 3.142e+01 3.302e+01 4.831e+01, threshold=6.284e+01, percent-clipped=0.0 2023-12-22 19:33:15,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-12-22 19:33:27,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=747053.3333333334, ans=0.2 2023-12-22 19:33:34,075 INFO [train.py:886] (3/4) Epoch 24, batch 2450, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4955465.22 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:33:37,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=747120.0, ans=0.125 2023-12-22 19:34:00,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-12-22 19:34:00,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=747253.3333333334, ans=0.1 2023-12-22 19:34:19,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=747386.6666666666, ans=0.0 2023-12-22 19:34:25,707 INFO [train.py:886] (3/4) Epoch 24, batch 2500, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4951015.15 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:34:30,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747453.3333333334, ans=0.1 2023-12-22 19:34:58,726 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.071e+01 3.224e+01 3.383e+01 4.585e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-22 19:35:04,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2023-12-22 19:35:09,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=747720.0, ans=0.2 2023-12-22 19:35:17,089 INFO [train.py:886] (3/4) Epoch 24, batch 2550, loss[loss=0.01194, audio_tagging_loss=0.01194, over 24750.00 frames. ], tot_loss[loss=0.0134, audio_tagging_loss=0.0134, over 4948393.77 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:35:38,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=747920.0, ans=0.125 2023-12-22 19:35:44,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-22 19:35:51,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=747986.6666666666, ans=0.2 2023-12-22 19:35:55,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=747986.6666666666, ans=0.0 2023-12-22 19:36:09,581 INFO [train.py:886] (3/4) Epoch 24, batch 2600, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4946082.04 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:36:18,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=748186.6666666666, ans=0.125 2023-12-22 19:36:20,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=748186.6666666666, ans=0.1 2023-12-22 19:36:31,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=748253.3333333334, ans=0.0 2023-12-22 19:36:42,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.710e+01 3.007e+01 3.151e+01 3.323e+01 3.960e+01, threshold=6.303e+01, percent-clipped=0.0 2023-12-22 19:37:00,800 INFO [train.py:886] (3/4) Epoch 24, batch 2650, loss[loss=0.009757, audio_tagging_loss=0.009757, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4947426.36 frames. ], batch size: 99, lr: 4.51e-03, grad_scale: 64.0 2023-12-22 19:37:05,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=748453.3333333334, ans=0.04949747468305833 2023-12-22 19:37:07,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=748453.3333333334, ans=0.125 2023-12-22 19:37:11,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748520.0, ans=0.1 2023-12-22 19:37:14,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748520.0, ans=0.125 2023-12-22 19:37:27,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=748586.6666666666, ans=0.125 2023-12-22 19:37:32,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=748653.3333333334, ans=0.125 2023-12-22 19:37:39,960 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:37:52,401 INFO [train.py:886] (3/4) Epoch 24, batch 2700, loss[loss=0.01315, audio_tagging_loss=0.01315, over 24076.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4953416.77 frames. ], batch size: 100, lr: 4.51e-03, grad_scale: 32.0 2023-12-22 19:38:01,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748853.3333333334, ans=0.1 2023-12-22 19:38:03,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:38:04,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=748853.3333333334, ans=0.0 2023-12-22 19:38:12,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=748920.0, ans=0.125 2023-12-22 19:38:22,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748986.6666666666, ans=0.1 2023-12-22 19:38:26,495 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.070e+01 3.195e+01 3.314e+01 3.793e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 19:38:28,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=748986.6666666666, ans=0.125 2023-12-22 19:38:44,134 INFO [train.py:886] (3/4) Epoch 24, batch 2750, loss[loss=0.01099, audio_tagging_loss=0.01099, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4957687.53 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:38:47,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=749120.0, ans=0.0 2023-12-22 19:38:52,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=749120.0, ans=0.125 2023-12-22 19:39:02,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-22 19:39:22,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=749320.0, ans=0.0 2023-12-22 19:39:28,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749386.6666666666, ans=0.0 2023-12-22 19:39:35,695 INFO [train.py:886] (3/4) Epoch 24, batch 2800, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4952677.60 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:40:05,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2023-12-22 19:40:09,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2023-12-22 19:40:09,522 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.066e+01 3.201e+01 3.379e+01 3.899e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 19:40:14,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=749653.3333333334, ans=0.2 2023-12-22 19:40:28,420 INFO [train.py:886] (3/4) Epoch 24, batch 2850, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4943508.66 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:40:29,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.74 vs. limit=10.0 2023-12-22 19:40:48,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=749920.0, ans=0.125 2023-12-22 19:41:19,669 INFO [train.py:886] (3/4) Epoch 24, batch 2900, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4941286.80 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:41:33,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=750186.6666666666, ans=0.0 2023-12-22 19:41:52,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=750320.0, ans=0.125 2023-12-22 19:41:53,958 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.002e+01 3.182e+01 3.351e+01 5.287e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 19:41:54,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2023-12-22 19:41:59,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750320.0, ans=0.125 2023-12-22 19:42:12,210 INFO [train.py:886] (3/4) Epoch 24, batch 2950, loss[loss=0.01255, audio_tagging_loss=0.01255, over 23984.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4947130.02 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:42:21,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=750520.0, ans=0.125 2023-12-22 19:42:28,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=750520.0, ans=0.0 2023-12-22 19:42:39,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750586.6666666666, ans=0.0 2023-12-22 19:42:49,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=750653.3333333334, ans=0.09899494936611666 2023-12-22 19:42:56,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750720.0, ans=0.125 2023-12-22 19:43:01,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=750720.0, ans=0.125 2023-12-22 19:43:03,924 INFO [train.py:886] (3/4) Epoch 24, batch 3000, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4953313.84 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:43:03,925 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 19:43:13,025 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7414, 2.1492, 3.2458, 2.2570, 3.6763, 2.6256, 1.0324, 1.8785], device='cuda:3') 2023-12-22 19:43:24,948 INFO [train.py:917] (3/4) Epoch 24, validation: loss=0.03301, audio_tagging_loss=0.03301, over 3737520.00 frames. 2023-12-22 19:43:24,949 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 19:43:33,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=750786.6666666666, ans=0.125 2023-12-22 19:43:54,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=750920.0, ans=0.0 2023-12-22 19:43:59,580 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 2.995e+01 3.133e+01 3.253e+01 3.784e+01, threshold=6.265e+01, percent-clipped=0.0 2023-12-22 19:44:00,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=750986.6666666666, ans=0.0 2023-12-22 19:44:02,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.46 vs. limit=22.5 2023-12-22 19:44:12,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=751053.3333333334, ans=0.0 2023-12-22 19:44:13,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=751053.3333333334, ans=0.125 2023-12-22 19:44:17,111 INFO [train.py:886] (3/4) Epoch 24, batch 3050, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4957081.01 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:44:18,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=751120.0, ans=0.125 2023-12-22 19:44:22,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=751120.0, ans=0.125 2023-12-22 19:44:41,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751253.3333333334, ans=0.0 2023-12-22 19:45:04,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=751386.6666666666, ans=0.0 2023-12-22 19:45:08,377 INFO [train.py:886] (3/4) Epoch 24, batch 3100, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4955492.81 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:45:16,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751453.3333333334, ans=0.1 2023-12-22 19:45:22,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-12-22 19:45:24,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751520.0, ans=0.0 2023-12-22 19:45:36,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=751586.6666666666, ans=0.125 2023-12-22 19:45:41,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=751653.3333333334, ans=0.2 2023-12-22 19:45:43,074 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.717e+01 3.050e+01 3.194e+01 3.363e+01 4.178e+01, threshold=6.387e+01, percent-clipped=0.0 2023-12-22 19:45:44,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2023-12-22 19:45:46,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751653.3333333334, ans=0.1 2023-12-22 19:45:47,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751653.3333333334, ans=0.0 2023-12-22 19:45:59,788 INFO [train.py:886] (3/4) Epoch 24, batch 3150, loss[loss=0.01562, audio_tagging_loss=0.01562, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4947425.30 frames. ], batch size: 100, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:46:05,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=751786.6666666666, ans=0.0 2023-12-22 19:46:45,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752053.3333333334, ans=0.1 2023-12-22 19:46:52,400 INFO [train.py:886] (3/4) Epoch 24, batch 3200, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4944742.88 frames. ], batch size: 99, lr: 4.50e-03, grad_scale: 32.0 2023-12-22 19:47:26,254 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.034e+01 3.141e+01 3.276e+01 3.703e+01, threshold=6.281e+01, percent-clipped=0.0 2023-12-22 19:47:32,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752386.6666666666, ans=0.1 2023-12-22 19:47:36,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=752386.6666666666, ans=0.0 2023-12-22 19:47:41,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=752386.6666666666, ans=0.0 2023-12-22 19:47:42,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=752453.3333333334, ans=0.125 2023-12-22 19:47:43,746 INFO [train.py:886] (3/4) Epoch 24, batch 3250, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4947639.10 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:48:33,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=752720.0, ans=0.2 2023-12-22 19:48:35,257 INFO [train.py:886] (3/4) Epoch 24, batch 3300, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4951003.88 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:48:42,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-22 19:48:54,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=752853.3333333334, ans=0.0 2023-12-22 19:48:57,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=12.0 2023-12-22 19:49:07,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=752986.6666666666, ans=0.0 2023-12-22 19:49:09,400 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.005e+01 3.126e+01 3.283e+01 3.763e+01, threshold=6.252e+01, percent-clipped=0.0 2023-12-22 19:49:26,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753120.0, ans=0.125 2023-12-22 19:49:27,685 INFO [train.py:886] (3/4) Epoch 24, batch 3350, loss[loss=0.01309, audio_tagging_loss=0.01309, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4953155.99 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:49:31,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=753120.0, ans=0.125 2023-12-22 19:49:33,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2023-12-22 19:49:59,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=753320.0, ans=0.125 2023-12-22 19:50:03,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2023-12-22 19:50:07,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753320.0, ans=0.125 2023-12-22 19:50:17,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-12-22 19:50:19,714 INFO [train.py:886] (3/4) Epoch 24, batch 3400, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4961758.68 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:50:41,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=753586.6666666666, ans=0.04949747468305833 2023-12-22 19:50:54,454 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+01 3.015e+01 3.148e+01 3.318e+01 3.787e+01, threshold=6.295e+01, percent-clipped=0.0 2023-12-22 19:51:11,215 INFO [train.py:886] (3/4) Epoch 24, batch 3450, loss[loss=0.01376, audio_tagging_loss=0.01376, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4957039.61 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:51:11,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.22 vs. limit=22.5 2023-12-22 19:51:15,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=753786.6666666666, ans=0.0 2023-12-22 19:51:17,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=753786.6666666666, ans=0.0 2023-12-22 19:51:26,626 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:51:32,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=753920.0, ans=15.0 2023-12-22 19:51:46,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753986.6666666666, ans=0.125 2023-12-22 19:52:03,980 INFO [train.py:886] (3/4) Epoch 24, batch 3500, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4945716.10 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:52:06,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=754120.0, ans=0.125 2023-12-22 19:52:07,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=754120.0, ans=0.0 2023-12-22 19:52:08,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=754120.0, ans=0.125 2023-12-22 19:52:12,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=754186.6666666666, ans=0.125 2023-12-22 19:52:13,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=754186.6666666666, ans=0.125 2023-12-22 19:52:17,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=754186.6666666666, ans=0.125 2023-12-22 19:52:31,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=754253.3333333334, ans=0.2 2023-12-22 19:52:38,071 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.772e+01 3.028e+01 3.212e+01 3.392e+01 3.862e+01, threshold=6.425e+01, percent-clipped=0.0 2023-12-22 19:52:44,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=754386.6666666666, ans=0.07 2023-12-22 19:52:46,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=12.0 2023-12-22 19:52:54,938 INFO [train.py:886] (3/4) Epoch 24, batch 3550, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4948099.47 frames. ], batch size: 99, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:53:19,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754586.6666666666, ans=0.1 2023-12-22 19:53:32,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=754653.3333333334, ans=0.125 2023-12-22 19:53:38,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=754720.0, ans=0.09899494936611666 2023-12-22 19:53:47,384 INFO [train.py:886] (3/4) Epoch 24, batch 3600, loss[loss=0.01407, audio_tagging_loss=0.01407, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4947269.49 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:54:05,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=754853.3333333334, ans=0.125 2023-12-22 19:54:06,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=754920.0, ans=0.0 2023-12-22 19:54:07,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=754920.0, ans=0.125 2023-12-22 19:54:16,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=754920.0, ans=0.2 2023-12-22 19:54:20,696 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 2.974e+01 3.085e+01 3.211e+01 3.722e+01, threshold=6.169e+01, percent-clipped=0.0 2023-12-22 19:54:25,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-22 19:54:38,347 INFO [train.py:886] (3/4) Epoch 24, batch 3650, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24913.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4953182.58 frames. ], batch size: 100, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:54:48,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=755186.6666666666, ans=0.125 2023-12-22 19:54:51,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-12-22 19:54:56,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=755186.6666666666, ans=0.0 2023-12-22 19:55:03,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=755253.3333333334, ans=0.2 2023-12-22 19:55:03,050 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 19:55:29,372 INFO [train.py:886] (3/4) Epoch 24, batch 3700, loss[loss=0.009355, audio_tagging_loss=0.009355, over 21848.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4952547.70 frames. ], batch size: 107, lr: 4.49e-03, grad_scale: 32.0 2023-12-22 19:55:52,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=755586.6666666666, ans=0.0 2023-12-22 19:56:01,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-12-22 19:56:03,430 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.064e+01 3.214e+01 3.340e+01 3.763e+01, threshold=6.428e+01, percent-clipped=0.0 2023-12-22 19:56:04,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=755653.3333333334, ans=0.125 2023-12-22 19:56:20,935 INFO [train.py:886] (3/4) Epoch 24, batch 3750, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4952035.72 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:56:31,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=755853.3333333334, ans=0.0 2023-12-22 19:56:43,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=755920.0, ans=0.0 2023-12-22 19:57:02,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=756053.3333333334, ans=0.0 2023-12-22 19:57:02,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=756053.3333333334, ans=0.2 2023-12-22 19:57:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=756053.3333333334, ans=0.125 2023-12-22 19:57:12,849 INFO [train.py:886] (3/4) Epoch 24, batch 3800, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4945006.34 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:57:42,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=756320.0, ans=0.0 2023-12-22 19:57:47,019 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.557e+01 3.075e+01 3.225e+01 3.353e+01 3.871e+01, threshold=6.449e+01, percent-clipped=0.0 2023-12-22 19:57:55,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=756386.6666666666, ans=0.2 2023-12-22 19:58:02,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756386.6666666666, ans=0.125 2023-12-22 19:58:04,512 INFO [train.py:886] (3/4) Epoch 24, batch 3850, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4944347.57 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:12,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=756453.3333333334, ans=0.2 2023-12-22 19:58:26,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-12-22 19:58:31,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-12-22 19:58:35,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756653.3333333334, ans=0.1 2023-12-22 19:58:42,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=756653.3333333334, ans=0.125 2023-12-22 19:58:45,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756720.0, ans=0.125 2023-12-22 19:58:51,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2023-12-22 19:58:56,721 INFO [train.py:886] (3/4) Epoch 24, batch 3900, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4951509.84 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:58:56,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=756786.6666666666, ans=0.0 2023-12-22 19:58:59,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=756786.6666666666, ans=0.0 2023-12-22 19:58:59,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=756786.6666666666, ans=0.125 2023-12-22 19:59:03,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=756786.6666666666, ans=0.125 2023-12-22 19:59:05,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-12-22 19:59:11,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 19:59:11,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-22 19:59:13,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 19:59:16,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=756920.0, ans=0.2 2023-12-22 19:59:29,994 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.786e+01 3.006e+01 3.171e+01 3.357e+01 4.139e+01, threshold=6.342e+01, percent-clipped=0.0 2023-12-22 19:59:46,973 INFO [train.py:886] (3/4) Epoch 24, batch 3950, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4949882.46 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 19:59:48,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=757120.0, ans=0.0 2023-12-22 19:59:55,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=757120.0, ans=0.2 2023-12-22 20:00:29,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=757386.6666666666, ans=0.0 2023-12-22 20:00:39,000 INFO [train.py:886] (3/4) Epoch 24, batch 4000, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4954594.75 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:00:42,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=757453.3333333334, ans=0.125 2023-12-22 20:00:51,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757520.0, ans=0.0 2023-12-22 20:00:53,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-12-22 20:00:59,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-12-22 20:01:01,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=757586.6666666666, ans=0.0 2023-12-22 20:01:11,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=757653.3333333334, ans=0.125 2023-12-22 20:01:11,812 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.071e+01 3.185e+01 3.335e+01 3.976e+01, threshold=6.370e+01, percent-clipped=0.0 2023-12-22 20:01:22,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-12-22 20:01:29,357 INFO [train.py:886] (3/4) Epoch 24, batch 4050, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4959129.97 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:01:31,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=757786.6666666666, ans=0.125 2023-12-22 20:01:32,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=757786.6666666666, ans=0.0 2023-12-22 20:01:34,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.28 vs. limit=15.0 2023-12-22 20:01:42,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-12-22 20:01:47,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=757853.3333333334, ans=0.125 2023-12-22 20:02:20,026 INFO [train.py:886] (3/4) Epoch 24, batch 4100, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4956980.80 frames. ], batch size: 99, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:02:24,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-12-22 20:02:25,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=758120.0, ans=0.015 2023-12-22 20:02:37,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-12-22 20:02:48,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=758253.3333333334, ans=0.0 2023-12-22 20:02:54,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.829e+01 3.126e+01 3.267e+01 3.430e+01 4.193e+01, threshold=6.535e+01, percent-clipped=0.0 2023-12-22 20:03:11,635 INFO [train.py:886] (3/4) Epoch 24, batch 4150, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4953932.58 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:03:12,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=758453.3333333334, ans=0.125 2023-12-22 20:03:18,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=758453.3333333334, ans=0.2 2023-12-22 20:03:21,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758520.0, ans=0.1 2023-12-22 20:03:22,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=758520.0, ans=0.125 2023-12-22 20:03:24,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758520.0, ans=0.1 2023-12-22 20:03:25,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=758520.0, ans=0.0 2023-12-22 20:03:31,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2023-12-22 20:03:35,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-12-22 20:03:35,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=758586.6666666666, ans=22.5 2023-12-22 20:03:39,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=758586.6666666666, ans=0.125 2023-12-22 20:03:42,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=758653.3333333334, ans=0.125 2023-12-22 20:04:03,476 INFO [train.py:886] (3/4) Epoch 24, batch 4200, loss[loss=0.01639, audio_tagging_loss=0.01639, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4945879.15 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 32.0 2023-12-22 20:04:15,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=758853.3333333334, ans=22.5 2023-12-22 20:04:38,266 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.640e+01 3.033e+01 3.184e+01 3.387e+01 4.147e+01, threshold=6.368e+01, percent-clipped=0.0 2023-12-22 20:04:45,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=759053.3333333334, ans=0.035 2023-12-22 20:04:50,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=759053.3333333334, ans=0.1 2023-12-22 20:04:55,905 INFO [train.py:886] (3/4) Epoch 24, batch 4250, loss[loss=0.01716, audio_tagging_loss=0.01716, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4947092.22 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:05:02,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759120.0, ans=0.1 2023-12-22 20:05:09,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=759186.6666666666, ans=0.0 2023-12-22 20:05:09,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2023-12-22 20:05:15,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=759186.6666666666, ans=0.0 2023-12-22 20:05:20,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-22 20:05:47,499 INFO [train.py:886] (3/4) Epoch 24, batch 4300, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4954489.16 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:05:52,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2023-12-22 20:05:59,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=759520.0, ans=0.0 2023-12-22 20:06:02,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=759520.0, ans=0.125 2023-12-22 20:06:06,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=759520.0, ans=0.125 2023-12-22 20:06:09,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759586.6666666666, ans=0.1 2023-12-22 20:06:21,638 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.739e+01 3.083e+01 3.257e+01 3.408e+01 3.899e+01, threshold=6.514e+01, percent-clipped=0.0 2023-12-22 20:06:39,223 INFO [train.py:886] (3/4) Epoch 24, batch 4350, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4958629.85 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:06:39,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-12-22 20:06:48,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=759853.3333333334, ans=0.2 2023-12-22 20:06:59,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759920.0, ans=0.1 2023-12-22 20:07:07,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.66 vs. limit=22.5 2023-12-22 20:07:07,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-12-22 20:07:13,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-12-22 20:07:15,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=759986.6666666666, ans=0.125 2023-12-22 20:07:29,310 INFO [train.py:886] (3/4) Epoch 24, batch 4400, loss[loss=0.01478, audio_tagging_loss=0.01478, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4953927.46 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:07:30,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760120.0, ans=0.1 2023-12-22 20:07:43,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760186.6666666666, ans=0.125 2023-12-22 20:07:58,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2023-12-22 20:08:03,220 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.094e+01 3.285e+01 3.414e+01 4.025e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 20:08:06,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=760320.0, ans=0.0 2023-12-22 20:08:14,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=760386.6666666666, ans=0.2 2023-12-22 20:08:20,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-12-22 20:08:20,888 INFO [train.py:886] (3/4) Epoch 24, batch 4450, loss[loss=0.01523, audio_tagging_loss=0.01523, over 24750.00 frames. ], tot_loss[loss=0.01337, audio_tagging_loss=0.01337, over 4952872.99 frames. ], batch size: 99, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:08:58,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=760653.3333333334, ans=0.07 2023-12-22 20:09:01,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=760720.0, ans=0.125 2023-12-22 20:09:11,063 INFO [train.py:886] (3/4) Epoch 24, batch 4500, loss[loss=0.01152, audio_tagging_loss=0.01152, over 21779.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4954364.49 frames. ], batch size: 107, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:09:12,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760786.6666666666, ans=0.1 2023-12-22 20:09:18,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-22 20:09:31,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.28 vs. limit=10.0 2023-12-22 20:09:33,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=760920.0, ans=0.125 2023-12-22 20:09:43,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:09:44,442 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.026e+01 3.195e+01 3.356e+01 3.785e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:09:55,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=761053.3333333334, ans=0.0 2023-12-22 20:10:02,165 INFO [train.py:886] (3/4) Epoch 24, batch 4550, loss[loss=0.013, audio_tagging_loss=0.013, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4952254.35 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:10:14,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=761186.6666666666, ans=0.07 2023-12-22 20:10:18,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=761186.6666666666, ans=0.2 2023-12-22 20:10:38,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=761320.0, ans=0.09899494936611666 2023-12-22 20:10:50,253 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.546e-03 2023-12-22 20:10:52,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-12-22 20:10:52,869 INFO [train.py:886] (3/4) Epoch 24, batch 4600, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4948744.00 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:11:02,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.09 vs. limit=6.0 2023-12-22 20:11:02,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=761520.0, ans=0.125 2023-12-22 20:11:22,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=761653.3333333334, ans=0.2 2023-12-22 20:11:24,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761653.3333333334, ans=0.1 2023-12-22 20:11:26,352 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.049e+01 3.188e+01 3.293e+01 3.934e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 20:11:34,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761720.0, ans=0.1 2023-12-22 20:11:43,932 INFO [train.py:886] (3/4) Epoch 24, batch 4650, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4950178.52 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 32.0 2023-12-22 20:11:46,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2023-12-22 20:12:00,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=761853.3333333334, ans=0.2 2023-12-22 20:12:14,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=761986.6666666666, ans=0.0 2023-12-22 20:12:26,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.03 vs. limit=22.5 2023-12-22 20:12:29,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=762053.3333333334, ans=0.0 2023-12-22 20:12:34,690 INFO [train.py:886] (3/4) Epoch 24, batch 4700, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4950906.75 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 64.0 2023-12-22 20:12:37,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-12-22 20:12:39,361 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:12:47,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-22 20:13:06,270 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.692e+01 3.057e+01 3.259e+01 3.422e+01 3.879e+01, threshold=6.518e+01, percent-clipped=0.0 2023-12-22 20:13:11,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762320.0, ans=0.125 2023-12-22 20:13:11,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=762320.0, ans=0.0 2023-12-22 20:13:22,087 INFO [train.py:886] (3/4) Epoch 24, batch 4750, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4948025.20 frames. ], batch size: 99, lr: 4.46e-03, grad_scale: 64.0 2023-12-22 20:13:22,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762453.3333333334, ans=0.1 2023-12-22 20:13:24,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=762453.3333333334, ans=0.0 2023-12-22 20:13:57,231 INFO [train.py:886] (3/4) Epoch 25, batch 0, loss[loss=0.03216, audio_tagging_loss=0.03216, over 20788.00 frames. ], tot_loss[loss=0.03216, audio_tagging_loss=0.03216, over 20788.00 frames. ], batch size: 107, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:13:57,231 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 20:14:18,052 INFO [train.py:917] (3/4) Epoch 25, validation: loss=0.03205, audio_tagging_loss=0.03205, over 3737520.00 frames. 2023-12-22 20:14:18,052 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 20:14:18,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-12-22 20:14:20,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-22 20:14:23,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-12-22 20:14:28,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=12.0 2023-12-22 20:15:01,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=762826.6666666666, ans=0.0 2023-12-22 20:15:09,647 INFO [train.py:886] (3/4) Epoch 25, batch 50, loss[loss=0.01845, audio_tagging_loss=0.01845, over 25000.00 frames. ], tot_loss[loss=0.02084, audio_tagging_loss=0.02084, over 1110445.50 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:15:21,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=762960.0, ans=0.2 2023-12-22 20:15:27,233 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.385e+01 3.837e+01 4.351e+01 9.829e+01, threshold=7.674e+01, percent-clipped=6.0 2023-12-22 20:16:00,597 INFO [train.py:886] (3/4) Epoch 25, batch 100, loss[loss=0.01465, audio_tagging_loss=0.01465, over 25000.00 frames. ], tot_loss[loss=0.01807, audio_tagging_loss=0.01807, over 1963034.59 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:16:35,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763426.6666666666, ans=0.1 2023-12-22 20:16:52,786 INFO [train.py:886] (3/4) Epoch 25, batch 150, loss[loss=0.01466, audio_tagging_loss=0.01466, over 24750.00 frames. ], tot_loss[loss=0.01658, audio_tagging_loss=0.01658, over 2626471.89 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:17:10,526 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.174e+01 3.367e+01 3.565e+01 4.203e+01, threshold=6.734e+01, percent-clipped=0.0 2023-12-22 20:17:17,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=763693.3333333334, ans=0.125 2023-12-22 20:17:23,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=10.0 2023-12-22 20:17:25,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763760.0, ans=0.1 2023-12-22 20:17:42,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=763893.3333333334, ans=0.0 2023-12-22 20:17:44,277 INFO [train.py:886] (3/4) Epoch 25, batch 200, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.0155, audio_tagging_loss=0.0155, over 3144165.88 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:17:47,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763893.3333333334, ans=0.125 2023-12-22 20:18:02,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=763960.0, ans=10.0 2023-12-22 20:18:13,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=764026.6666666666, ans=0.125 2023-12-22 20:18:24,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=764093.3333333334, ans=15.0 2023-12-22 20:18:30,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=764160.0, ans=0.125 2023-12-22 20:18:34,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=764160.0, ans=0.125 2023-12-22 20:18:37,142 INFO [train.py:886] (3/4) Epoch 25, batch 250, loss[loss=0.01502, audio_tagging_loss=0.01502, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 3547111.09 frames. ], batch size: 100, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:18:43,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=764226.6666666666, ans=0.125 2023-12-22 20:18:54,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=764293.3333333334, ans=0.125 2023-12-22 20:18:55,656 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.053e+01 3.207e+01 3.358e+01 3.968e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 20:19:03,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=764360.0, ans=0.125 2023-12-22 20:19:28,800 INFO [train.py:886] (3/4) Epoch 25, batch 300, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01449, audio_tagging_loss=0.01449, over 3855837.39 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:19:28,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764560.0, ans=0.125 2023-12-22 20:19:31,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-12-22 20:19:33,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2023-12-22 20:19:44,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-22 20:20:11,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=764826.6666666666, ans=0.0 2023-12-22 20:20:15,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=764826.6666666666, ans=0.05 2023-12-22 20:20:19,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764893.3333333334, ans=0.1 2023-12-22 20:20:20,612 INFO [train.py:886] (3/4) Epoch 25, batch 350, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01417, audio_tagging_loss=0.01417, over 4097314.72 frames. ], batch size: 99, lr: 4.37e-03, grad_scale: 32.0 2023-12-22 20:20:21,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=764893.3333333334, ans=0.2 2023-12-22 20:20:27,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764893.3333333334, ans=0.1 2023-12-22 20:20:28,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=764893.3333333334, ans=0.125 2023-12-22 20:20:33,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=764960.0, ans=0.125 2023-12-22 20:20:40,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.026e+01 3.207e+01 3.330e+01 3.805e+01, threshold=6.415e+01, percent-clipped=0.0 2023-12-22 20:20:48,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=765026.6666666666, ans=0.125 2023-12-22 20:21:10,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=765160.0, ans=0.2 2023-12-22 20:21:10,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=765160.0, ans=0.125 2023-12-22 20:21:13,463 INFO [train.py:886] (3/4) Epoch 25, batch 400, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 4289526.93 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:21:28,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=765293.3333333334, ans=0.125 2023-12-22 20:21:28,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765293.3333333334, ans=0.1 2023-12-22 20:21:37,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2023-12-22 20:21:48,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=765426.6666666666, ans=0.125 2023-12-22 20:21:49,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=765426.6666666666, ans=0.2 2023-12-22 20:21:51,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=765426.6666666666, ans=0.0 2023-12-22 20:21:51,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=765426.6666666666, ans=0.0 2023-12-22 20:21:54,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=765493.3333333334, ans=0.125 2023-12-22 20:21:59,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=765493.3333333334, ans=0.1 2023-12-22 20:22:00,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-22 20:22:02,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=765493.3333333334, ans=0.125 2023-12-22 20:22:03,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765560.0, ans=0.125 2023-12-22 20:22:04,091 INFO [train.py:886] (3/4) Epoch 25, batch 450, loss[loss=0.01497, audio_tagging_loss=0.01497, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 4438677.19 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:17,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=765626.6666666666, ans=0.125 2023-12-22 20:22:17,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=765626.6666666666, ans=0.09899494936611666 2023-12-22 20:22:18,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-12-22 20:22:23,126 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.006e+01 3.168e+01 3.345e+01 4.036e+01, threshold=6.336e+01, percent-clipped=0.0 2023-12-22 20:22:52,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=765826.6666666666, ans=0.0 2023-12-22 20:22:56,176 INFO [train.py:886] (3/4) Epoch 25, batch 500, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01345, audio_tagging_loss=0.01345, over 4554103.61 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:22:56,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=765893.3333333334, ans=0.125 2023-12-22 20:23:06,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=765960.0, ans=0.0 2023-12-22 20:23:09,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=765960.0, ans=0.04949747468305833 2023-12-22 20:23:26,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2023-12-22 20:23:39,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-22 20:23:47,577 INFO [train.py:886] (3/4) Epoch 25, batch 550, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01339, audio_tagging_loss=0.01339, over 4650308.95 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:24:05,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766293.3333333334, ans=0.1 2023-12-22 20:24:05,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.48 vs. limit=22.5 2023-12-22 20:24:05,993 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.053e+01 3.176e+01 3.328e+01 4.174e+01, threshold=6.352e+01, percent-clipped=0.0 2023-12-22 20:24:16,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766360.0, ans=0.0 2023-12-22 20:24:23,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2023-12-22 20:24:37,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=766493.3333333334, ans=0.125 2023-12-22 20:24:39,254 INFO [train.py:886] (3/4) Epoch 25, batch 600, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24750.00 frames. ], tot_loss[loss=0.01349, audio_tagging_loss=0.01349, over 4719042.41 frames. ], batch size: 99, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:24:39,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=766560.0, ans=0.0 2023-12-22 20:24:48,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=766560.0, ans=0.125 2023-12-22 20:24:49,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=766626.6666666666, ans=10.0 2023-12-22 20:25:19,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=766760.0, ans=0.125 2023-12-22 20:25:21,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=766826.6666666666, ans=0.125 2023-12-22 20:25:31,418 INFO [train.py:886] (3/4) Epoch 25, batch 650, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01355, audio_tagging_loss=0.01355, over 4767923.85 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:25:38,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=766893.3333333334, ans=0.2 2023-12-22 20:25:49,888 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.858e+01 3.083e+01 3.219e+01 3.359e+01 3.843e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 20:25:58,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=767026.6666666666, ans=0.125 2023-12-22 20:26:03,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=767093.3333333334, ans=0.0 2023-12-22 20:26:09,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=767093.3333333334, ans=0.0 2023-12-22 20:26:11,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=767093.3333333334, ans=0.125 2023-12-22 20:26:15,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=767160.0, ans=0.2 2023-12-22 20:26:19,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=767160.0, ans=0.125 2023-12-22 20:26:22,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=767226.6666666666, ans=0.2 2023-12-22 20:26:22,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767226.6666666666, ans=0.1 2023-12-22 20:26:23,594 INFO [train.py:886] (3/4) Epoch 25, batch 700, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 4806933.36 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:26:43,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=767360.0, ans=0.0 2023-12-22 20:26:59,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=767426.6666666666, ans=0.125 2023-12-22 20:27:11,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:27:15,233 INFO [train.py:886] (3/4) Epoch 25, batch 750, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24091.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4837375.95 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:27:22,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=767560.0, ans=0.125 2023-12-22 20:27:32,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=767626.6666666666, ans=0.125 2023-12-22 20:27:33,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.045e+01 3.178e+01 3.302e+01 3.824e+01, threshold=6.355e+01, percent-clipped=0.0 2023-12-22 20:27:39,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767693.3333333334, ans=0.125 2023-12-22 20:27:44,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=767693.3333333334, ans=0.125 2023-12-22 20:27:49,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=767760.0, ans=0.0 2023-12-22 20:27:53,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767760.0, ans=0.1 2023-12-22 20:28:06,854 INFO [train.py:886] (3/4) Epoch 25, batch 800, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01333, audio_tagging_loss=0.01333, over 4869625.97 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:28:20,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=767960.0, ans=0.0 2023-12-22 20:28:50,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=768160.0, ans=0.125 2023-12-22 20:28:50,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=768160.0, ans=0.125 2023-12-22 20:28:57,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-12-22 20:28:58,446 INFO [train.py:886] (3/4) Epoch 25, batch 850, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4896213.58 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:17,721 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.700e+01 3.008e+01 3.165e+01 3.343e+01 3.656e+01, threshold=6.329e+01, percent-clipped=0.0 2023-12-22 20:29:22,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768360.0, ans=0.1 2023-12-22 20:29:30,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-12-22 20:29:38,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=768426.6666666666, ans=0.125 2023-12-22 20:29:39,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=768426.6666666666, ans=15.0 2023-12-22 20:29:49,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=768493.3333333334, ans=0.2 2023-12-22 20:29:50,954 INFO [train.py:886] (3/4) Epoch 25, batch 900, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4914645.31 frames. ], batch size: 100, lr: 4.36e-03, grad_scale: 32.0 2023-12-22 20:29:51,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768560.0, ans=0.1 2023-12-22 20:29:51,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768560.0, ans=0.1 2023-12-22 20:29:52,068 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:29:54,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=768560.0, ans=0.0 2023-12-22 20:29:55,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2023-12-22 20:30:12,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768693.3333333334, ans=0.125 2023-12-22 20:30:13,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=768693.3333333334, ans=0.04949747468305833 2023-12-22 20:30:15,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=768693.3333333334, ans=0.2 2023-12-22 20:30:34,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=12.0 2023-12-22 20:30:43,138 INFO [train.py:886] (3/4) Epoch 25, batch 950, loss[loss=0.01314, audio_tagging_loss=0.01314, over 24750.00 frames. ], tot_loss[loss=0.01335, audio_tagging_loss=0.01335, over 4917380.55 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:30:45,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768893.3333333334, ans=0.1 2023-12-22 20:30:52,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=768960.0, ans=0.0 2023-12-22 20:31:00,898 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.092e+01 3.233e+01 3.411e+01 4.030e+01, threshold=6.467e+01, percent-clipped=0.0 2023-12-22 20:31:04,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=769026.6666666666, ans=0.0 2023-12-22 20:31:04,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=769026.6666666666, ans=0.125 2023-12-22 20:31:06,645 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:31:06,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=769026.6666666666, ans=15.0 2023-12-22 20:31:17,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=8.0 2023-12-22 20:31:21,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=769093.3333333334, ans=0.125 2023-12-22 20:31:31,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=769160.0, ans=0.125 2023-12-22 20:31:34,089 INFO [train.py:886] (3/4) Epoch 25, batch 1000, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4922948.10 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:31:41,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-12-22 20:31:43,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=769226.6666666666, ans=0.05 2023-12-22 20:31:52,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=769293.3333333334, ans=0.0 2023-12-22 20:31:53,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=769293.3333333334, ans=0.2 2023-12-22 20:31:58,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=769360.0, ans=0.0 2023-12-22 20:32:05,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=769426.6666666666, ans=0.05 2023-12-22 20:32:10,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=769426.6666666666, ans=0.125 2023-12-22 20:32:17,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769493.3333333334, ans=0.125 2023-12-22 20:32:20,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-12-22 20:32:21,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=769493.3333333334, ans=0.125 2023-12-22 20:32:26,425 INFO [train.py:886] (3/4) Epoch 25, batch 1050, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4930809.37 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:32:28,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-12-22 20:32:30,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=769560.0, ans=0.2 2023-12-22 20:32:44,770 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.666e+01 3.041e+01 3.195e+01 3.319e+01 3.773e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 20:32:49,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769693.3333333334, ans=0.1 2023-12-22 20:32:55,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=769693.3333333334, ans=0.1 2023-12-22 20:33:11,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-12-22 20:33:16,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=769826.6666666666, ans=0.2 2023-12-22 20:33:16,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=769826.6666666666, ans=0.2 2023-12-22 20:33:18,078 INFO [train.py:886] (3/4) Epoch 25, batch 1100, loss[loss=0.01275, audio_tagging_loss=0.01275, over 22371.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4933706.80 frames. ], batch size: 107, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:33:43,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=770026.6666666666, ans=0.0 2023-12-22 20:33:46,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-12-22 20:33:46,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770026.6666666666, ans=0.125 2023-12-22 20:34:09,552 INFO [train.py:886] (3/4) Epoch 25, batch 1150, loss[loss=0.01658, audio_tagging_loss=0.01658, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4945093.99 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:34:10,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.18 vs. limit=22.5 2023-12-22 20:34:20,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770293.3333333334, ans=0.1 2023-12-22 20:34:28,872 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.047e+01 3.195e+01 3.340e+01 6.361e+01, threshold=6.391e+01, percent-clipped=0.0 2023-12-22 20:34:31,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=770360.0, ans=0.2 2023-12-22 20:34:31,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=770360.0, ans=0.125 2023-12-22 20:35:02,099 INFO [train.py:886] (3/4) Epoch 25, batch 1200, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4950396.84 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:35:02,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2023-12-22 20:35:21,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=770693.3333333334, ans=0.125 2023-12-22 20:35:28,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=770693.3333333334, ans=0.2 2023-12-22 20:35:42,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2023-12-22 20:35:43,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=770826.6666666666, ans=15.0 2023-12-22 20:35:46,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770826.6666666666, ans=0.125 2023-12-22 20:35:48,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=770826.6666666666, ans=0.2 2023-12-22 20:35:49,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=770826.6666666666, ans=0.125 2023-12-22 20:35:53,921 INFO [train.py:886] (3/4) Epoch 25, batch 1250, loss[loss=0.01587, audio_tagging_loss=0.01587, over 24750.00 frames. ], tot_loss[loss=0.01331, audio_tagging_loss=0.01331, over 4947184.71 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:36:03,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=770893.3333333334, ans=0.0 2023-12-22 20:36:12,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.141e+01 3.235e+01 3.393e+01 3.874e+01, threshold=6.470e+01, percent-clipped=0.0 2023-12-22 20:36:13,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=770960.0, ans=0.2 2023-12-22 20:36:20,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=771026.6666666666, ans=0.0 2023-12-22 20:36:32,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=771093.3333333334, ans=0.125 2023-12-22 20:36:44,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=771160.0, ans=0.2 2023-12-22 20:36:45,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771226.6666666666, ans=0.1 2023-12-22 20:36:46,649 INFO [train.py:886] (3/4) Epoch 25, batch 1300, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4946185.90 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:37:03,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-12-22 20:37:04,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=771293.3333333334, ans=0.04949747468305833 2023-12-22 20:37:21,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-12-22 20:37:38,472 INFO [train.py:886] (3/4) Epoch 25, batch 1350, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4944807.48 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:37:42,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=771560.0, ans=0.1 2023-12-22 20:37:56,907 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.059e+01 3.211e+01 3.317e+01 3.906e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 20:38:00,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771693.3333333334, ans=0.125 2023-12-22 20:38:11,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=771760.0, ans=0.125 2023-12-22 20:38:11,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-12-22 20:38:18,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=771826.6666666666, ans=0.07 2023-12-22 20:38:29,828 INFO [train.py:886] (3/4) Epoch 25, batch 1400, loss[loss=0.009827, audio_tagging_loss=0.009827, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4939219.86 frames. ], batch size: 99, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:38:40,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-12-22 20:38:47,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=771960.0, ans=0.2 2023-12-22 20:38:47,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=771960.0, ans=12.0 2023-12-22 20:38:52,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2023-12-22 20:39:00,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=772093.3333333334, ans=0.125 2023-12-22 20:39:03,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-12-22 20:39:06,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=772093.3333333334, ans=0.0 2023-12-22 20:39:06,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=772093.3333333334, ans=0.125 2023-12-22 20:39:19,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772160.0, ans=0.1 2023-12-22 20:39:22,110 INFO [train.py:886] (3/4) Epoch 25, batch 1450, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4947698.83 frames. ], batch size: 100, lr: 4.35e-03, grad_scale: 32.0 2023-12-22 20:39:23,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=772226.6666666666, ans=0.07 2023-12-22 20:39:28,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=772226.6666666666, ans=0.125 2023-12-22 20:39:40,579 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+01 3.046e+01 3.154e+01 3.328e+01 3.789e+01, threshold=6.308e+01, percent-clipped=0.0 2023-12-22 20:39:41,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-12-22 20:39:42,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=772360.0, ans=0.125 2023-12-22 20:39:46,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2023-12-22 20:39:50,938 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:39:51,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772360.0, ans=0.1 2023-12-22 20:39:51,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=772360.0, ans=0.2 2023-12-22 20:40:14,144 INFO [train.py:886] (3/4) Epoch 25, batch 1500, loss[loss=0.01459, audio_tagging_loss=0.01459, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4942250.64 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:40:15,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=772560.0, ans=0.0 2023-12-22 20:40:28,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772626.6666666666, ans=0.125 2023-12-22 20:40:35,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-12-22 20:40:52,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-22 20:41:05,318 INFO [train.py:886] (3/4) Epoch 25, batch 1550, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4942988.36 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:41:07,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=772893.3333333334, ans=0.09899494936611666 2023-12-22 20:41:08,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.76 vs. limit=10.0 2023-12-22 20:41:17,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=772960.0, ans=0.125 2023-12-22 20:41:19,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=772960.0, ans=0.07 2023-12-22 20:41:24,019 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.051e+01 3.220e+01 3.373e+01 4.348e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:41:28,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=773026.6666666666, ans=0.0 2023-12-22 20:41:35,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=773093.3333333334, ans=0.0 2023-12-22 20:41:48,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=773160.0, ans=0.125 2023-12-22 20:41:56,980 INFO [train.py:886] (3/4) Epoch 25, batch 1600, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4940274.55 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:41:58,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2023-12-22 20:41:59,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2023-12-22 20:42:06,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=773293.3333333334, ans=0.0 2023-12-22 20:42:06,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=773293.3333333334, ans=0.125 2023-12-22 20:42:07,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=773293.3333333334, ans=0.0 2023-12-22 20:42:19,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=773360.0, ans=0.0 2023-12-22 20:42:26,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=773360.0, ans=0.2 2023-12-22 20:42:45,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=773493.3333333334, ans=0.0 2023-12-22 20:42:45,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773493.3333333334, ans=0.1 2023-12-22 20:42:49,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=773560.0, ans=0.125 2023-12-22 20:42:50,554 INFO [train.py:886] (3/4) Epoch 25, batch 1650, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4943453.43 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:42:57,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=773560.0, ans=0.05 2023-12-22 20:43:01,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773626.6666666666, ans=0.1 2023-12-22 20:43:08,964 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.090e+01 3.219e+01 3.390e+01 4.071e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 20:43:09,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=773626.6666666666, ans=0.035 2023-12-22 20:43:12,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2023-12-22 20:43:42,119 INFO [train.py:886] (3/4) Epoch 25, batch 1700, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4942684.06 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:43:43,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=773893.3333333334, ans=0.125 2023-12-22 20:43:45,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=773893.3333333334, ans=0.125 2023-12-22 20:43:46,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=773893.3333333334, ans=0.0 2023-12-22 20:43:50,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=773893.3333333334, ans=0.125 2023-12-22 20:43:58,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-12-22 20:44:19,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=774093.3333333334, ans=0.95 2023-12-22 20:44:21,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=774093.3333333334, ans=0.2 2023-12-22 20:44:24,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=774160.0, ans=0.0 2023-12-22 20:44:29,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=774160.0, ans=0.0 2023-12-22 20:44:29,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=774160.0, ans=0.125 2023-12-22 20:44:34,463 INFO [train.py:886] (3/4) Epoch 25, batch 1750, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4945779.14 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:44:36,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=774226.6666666666, ans=0.0 2023-12-22 20:44:38,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774226.6666666666, ans=0.125 2023-12-22 20:44:46,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-12-22 20:44:52,981 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.733e+01 3.014e+01 3.131e+01 3.293e+01 4.047e+01, threshold=6.262e+01, percent-clipped=0.0 2023-12-22 20:44:53,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2023-12-22 20:45:16,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=774493.3333333334, ans=0.2 2023-12-22 20:45:18,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=774493.3333333334, ans=0.05 2023-12-22 20:45:19,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=774493.3333333334, ans=0.125 2023-12-22 20:45:26,032 INFO [train.py:886] (3/4) Epoch 25, batch 1800, loss[loss=0.01418, audio_tagging_loss=0.01418, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4956079.53 frames. ], batch size: 100, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:45:29,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=774560.0, ans=0.125 2023-12-22 20:45:42,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774626.6666666666, ans=0.125 2023-12-22 20:45:51,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=774693.3333333334, ans=0.125 2023-12-22 20:46:00,714 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.186e-03 2023-12-22 20:46:00,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=774760.0, ans=0.125 2023-12-22 20:46:10,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=774826.6666666666, ans=0.125 2023-12-22 20:46:17,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=774826.6666666666, ans=0.0 2023-12-22 20:46:18,655 INFO [train.py:886] (3/4) Epoch 25, batch 1850, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4953901.96 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:46:27,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=774960.0, ans=0.125 2023-12-22 20:46:29,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=774960.0, ans=15.0 2023-12-22 20:46:37,074 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.812e+01 3.072e+01 3.202e+01 3.378e+01 4.183e+01, threshold=6.404e+01, percent-clipped=0.0 2023-12-22 20:47:10,320 INFO [train.py:886] (3/4) Epoch 25, batch 1900, loss[loss=0.0148, audio_tagging_loss=0.0148, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4947145.36 frames. ], batch size: 99, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:47:42,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-22 20:47:57,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=775493.3333333334, ans=0.05 2023-12-22 20:47:58,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-22 20:48:02,431 INFO [train.py:886] (3/4) Epoch 25, batch 1950, loss[loss=0.01259, audio_tagging_loss=0.01259, over 22459.00 frames. ], tot_loss[loss=0.0132, audio_tagging_loss=0.0132, over 4939609.86 frames. ], batch size: 107, lr: 4.34e-03, grad_scale: 32.0 2023-12-22 20:48:06,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=775560.0, ans=0.09899494936611666 2023-12-22 20:48:15,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=775626.6666666666, ans=0.125 2023-12-22 20:48:19,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=775626.6666666666, ans=0.125 2023-12-22 20:48:21,576 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.045e+01 3.163e+01 3.260e+01 3.752e+01, threshold=6.326e+01, percent-clipped=0.0 2023-12-22 20:48:27,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-12-22 20:48:34,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=775760.0, ans=0.125 2023-12-22 20:48:37,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775760.0, ans=0.1 2023-12-22 20:48:38,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775760.0, ans=0.125 2023-12-22 20:48:54,594 INFO [train.py:886] (3/4) Epoch 25, batch 2000, loss[loss=0.01404, audio_tagging_loss=0.01404, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4941819.08 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:49:11,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=775960.0, ans=0.125 2023-12-22 20:49:19,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=776026.6666666666, ans=0.125 2023-12-22 20:49:34,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=776160.0, ans=10.0 2023-12-22 20:49:45,784 INFO [train.py:886] (3/4) Epoch 25, batch 2050, loss[loss=0.01149, audio_tagging_loss=0.01149, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4947028.12 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:50:04,274 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.006e+01 3.133e+01 3.319e+01 3.992e+01, threshold=6.266e+01, percent-clipped=0.0 2023-12-22 20:50:12,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=776360.0, ans=0.125 2023-12-22 20:50:14,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=776360.0, ans=0.125 2023-12-22 20:50:17,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=776426.6666666666, ans=0.2 2023-12-22 20:50:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776493.3333333334, ans=0.125 2023-12-22 20:50:37,282 INFO [train.py:886] (3/4) Epoch 25, batch 2100, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4953658.05 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:50:42,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=776560.0, ans=0.125 2023-12-22 20:51:04,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=776693.3333333334, ans=0.125 2023-12-22 20:51:07,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=776760.0, ans=0.2 2023-12-22 20:51:18,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776826.6666666666, ans=0.1 2023-12-22 20:51:20,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776826.6666666666, ans=0.1 2023-12-22 20:51:28,936 INFO [train.py:886] (3/4) Epoch 25, batch 2150, loss[loss=0.01316, audio_tagging_loss=0.01316, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4955090.53 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:51:47,384 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.715e+01 3.098e+01 3.255e+01 3.441e+01 3.883e+01, threshold=6.510e+01, percent-clipped=0.0 2023-12-22 20:51:54,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-22 20:51:59,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=777093.3333333334, ans=0.2 2023-12-22 20:52:04,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=777093.3333333334, ans=0.2 2023-12-22 20:52:05,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777093.3333333334, ans=0.1 2023-12-22 20:52:10,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:12,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:16,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=777160.0, ans=0.04949747468305833 2023-12-22 20:52:19,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777160.0, ans=0.125 2023-12-22 20:52:21,123 INFO [train.py:886] (3/4) Epoch 25, batch 2200, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01329, audio_tagging_loss=0.01329, over 4947611.04 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:52:53,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=777426.6666666666, ans=0.125 2023-12-22 20:52:55,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=777426.6666666666, ans=0.95 2023-12-22 20:53:11,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777493.3333333334, ans=0.125 2023-12-22 20:53:13,065 INFO [train.py:886] (3/4) Epoch 25, batch 2250, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4940417.54 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:53:13,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=777560.0, ans=0.125 2023-12-22 20:53:17,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=777560.0, ans=0.125 2023-12-22 20:53:27,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777626.6666666666, ans=0.1 2023-12-22 20:53:28,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=777626.6666666666, ans=0.125 2023-12-22 20:53:30,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=777626.6666666666, ans=0.125 2023-12-22 20:53:30,762 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+01 3.106e+01 3.234e+01 3.414e+01 3.931e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 20:53:48,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=777760.0, ans=0.125 2023-12-22 20:54:02,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=777893.3333333334, ans=0.0 2023-12-22 20:54:03,755 INFO [train.py:886] (3/4) Epoch 25, batch 2300, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4943493.46 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:54:23,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=12.0 2023-12-22 20:54:55,993 INFO [train.py:886] (3/4) Epoch 25, batch 2350, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 4944793.91 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:55:10,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=778293.3333333334, ans=0.0 2023-12-22 20:55:14,440 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+01 3.039e+01 3.160e+01 3.337e+01 4.563e+01, threshold=6.321e+01, percent-clipped=0.0 2023-12-22 20:55:18,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2023-12-22 20:55:20,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778360.0, ans=0.125 2023-12-22 20:55:23,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=778360.0, ans=0.0 2023-12-22 20:55:35,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778426.6666666666, ans=0.1 2023-12-22 20:55:42,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=778493.3333333334, ans=0.125 2023-12-22 20:55:43,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778493.3333333334, ans=0.0 2023-12-22 20:55:46,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=778560.0, ans=0.125 2023-12-22 20:55:47,501 INFO [train.py:886] (3/4) Epoch 25, batch 2400, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4950663.65 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:56:03,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=778626.6666666666, ans=0.125 2023-12-22 20:56:06,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2023-12-22 20:56:08,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-12-22 20:56:25,459 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 20:56:39,827 INFO [train.py:886] (3/4) Epoch 25, batch 2450, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24012.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4955578.69 frames. ], batch size: 100, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:56:43,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=778893.3333333334, ans=0.1 2023-12-22 20:56:55,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2023-12-22 20:56:58,315 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.730e+01 3.055e+01 3.201e+01 3.364e+01 3.978e+01, threshold=6.403e+01, percent-clipped=0.0 2023-12-22 20:57:00,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-22 20:57:17,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=779093.3333333334, ans=0.125 2023-12-22 20:57:31,411 INFO [train.py:886] (3/4) Epoch 25, batch 2500, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4950075.74 frames. ], batch size: 99, lr: 4.33e-03, grad_scale: 64.0 2023-12-22 20:57:34,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-12-22 20:57:48,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=779293.3333333334, ans=0.125 2023-12-22 20:58:02,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=779426.6666666666, ans=0.07 2023-12-22 20:58:09,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-12-22 20:58:09,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779426.6666666666, ans=0.1 2023-12-22 20:58:21,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=779493.3333333334, ans=0.125 2023-12-22 20:58:22,686 INFO [train.py:886] (3/4) Epoch 25, batch 2550, loss[loss=0.01413, audio_tagging_loss=0.01413, over 24043.00 frames. ], tot_loss[loss=0.01332, audio_tagging_loss=0.01332, over 4945766.47 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:58:24,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=779560.0, ans=0.125 2023-12-22 20:58:29,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=779560.0, ans=0.0 2023-12-22 20:58:41,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.116e+01 3.261e+01 3.402e+01 3.994e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 20:58:41,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=779626.6666666666, ans=0.125 2023-12-22 20:58:46,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-12-22 20:58:46,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.22 vs. limit=10.0 2023-12-22 20:59:03,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-12-22 20:59:08,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-12-22 20:59:11,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=779826.6666666666, ans=0.125 2023-12-22 20:59:15,163 INFO [train.py:886] (3/4) Epoch 25, batch 2600, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4943261.69 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 20:59:30,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=779960.0, ans=0.2 2023-12-22 20:59:36,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=780026.6666666666, ans=0.125 2023-12-22 20:59:43,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=780026.6666666666, ans=10.0 2023-12-22 21:00:08,043 INFO [train.py:886] (3/4) Epoch 25, batch 2650, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4950328.14 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:00:26,449 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.067e+01 3.194e+01 3.330e+01 3.753e+01, threshold=6.389e+01, percent-clipped=0.0 2023-12-22 21:00:33,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2023-12-22 21:00:38,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780426.6666666666, ans=0.1 2023-12-22 21:00:42,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-12-22 21:00:46,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=780426.6666666666, ans=0.125 2023-12-22 21:00:56,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=780493.3333333334, ans=0.125 2023-12-22 21:00:59,656 INFO [train.py:886] (3/4) Epoch 25, batch 2700, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4955217.80 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:01:08,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=780560.0, ans=0.2 2023-12-22 21:01:08,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-12-22 21:01:14,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=780626.6666666666, ans=0.0 2023-12-22 21:01:37,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=780760.0, ans=0.04949747468305833 2023-12-22 21:01:40,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=780826.6666666666, ans=0.2 2023-12-22 21:01:42,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=780826.6666666666, ans=0.125 2023-12-22 21:01:51,165 INFO [train.py:886] (3/4) Epoch 25, batch 2750, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4954841.00 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:01:57,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=780893.3333333334, ans=0.0 2023-12-22 21:02:03,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-22 21:02:09,601 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.725e+01 3.037e+01 3.216e+01 3.345e+01 3.821e+01, threshold=6.432e+01, percent-clipped=0.0 2023-12-22 21:02:10,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=781026.6666666666, ans=0.2 2023-12-22 21:02:42,978 INFO [train.py:886] (3/4) Epoch 25, batch 2800, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4952114.59 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:02:58,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=781293.3333333334, ans=0.125 2023-12-22 21:03:05,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=12.0 2023-12-22 21:03:31,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=781493.3333333334, ans=0.125 2023-12-22 21:03:33,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=781493.3333333334, ans=0.0 2023-12-22 21:03:36,269 INFO [train.py:886] (3/4) Epoch 25, batch 2850, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4947223.20 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:03:54,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.071e+01 3.202e+01 3.384e+01 4.019e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:03:58,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2023-12-22 21:04:02,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=781693.3333333334, ans=0.125 2023-12-22 21:04:02,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=781693.3333333334, ans=0.125 2023-12-22 21:04:13,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.25 vs. limit=10.0 2023-12-22 21:04:14,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781760.0, ans=0.2 2023-12-22 21:04:18,969 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:04:19,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2023-12-22 21:04:19,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=8.0 2023-12-22 21:04:23,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=781826.6666666666, ans=0.0 2023-12-22 21:04:28,002 INFO [train.py:886] (3/4) Epoch 25, batch 2900, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4941444.29 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:04:31,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-12-22 21:05:08,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=782160.0, ans=0.125 2023-12-22 21:05:16,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=782160.0, ans=0.125 2023-12-22 21:05:18,745 INFO [train.py:886] (3/4) Epoch 25, batch 2950, loss[loss=0.01284, audio_tagging_loss=0.01284, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4944979.92 frames. ], batch size: 99, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:05:19,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782226.6666666666, ans=0.1 2023-12-22 21:05:28,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-12-22 21:05:35,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=782293.3333333334, ans=0.125 2023-12-22 21:05:37,121 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.031e+01 3.168e+01 3.321e+01 3.698e+01, threshold=6.337e+01, percent-clipped=0.0 2023-12-22 21:05:53,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=782426.6666666666, ans=0.125 2023-12-22 21:06:03,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=782493.3333333334, ans=22.5 2023-12-22 21:06:09,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=782560.0, ans=0.125 2023-12-22 21:06:10,413 INFO [train.py:886] (3/4) Epoch 25, batch 3000, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4951525.61 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:06:10,413 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 21:06:31,861 INFO [train.py:917] (3/4) Epoch 25, validation: loss=0.0331, audio_tagging_loss=0.0331, over 3737520.00 frames. 2023-12-22 21:06:31,862 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 21:06:32,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=782560.0, ans=0.0 2023-12-22 21:06:38,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=782560.0, ans=0.0 2023-12-22 21:07:15,437 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.053e-03 2023-12-22 21:07:16,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=782826.6666666666, ans=0.0 2023-12-22 21:07:23,391 INFO [train.py:886] (3/4) Epoch 25, batch 3050, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4955793.95 frames. ], batch size: 100, lr: 4.32e-03, grad_scale: 64.0 2023-12-22 21:07:27,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=782893.3333333334, ans=0.0 2023-12-22 21:07:28,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-12-22 21:07:32,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=782893.3333333334, ans=0.125 2023-12-22 21:07:41,967 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.821e+01 3.063e+01 3.171e+01 3.346e+01 3.819e+01, threshold=6.341e+01, percent-clipped=0.0 2023-12-22 21:07:43,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-12-22 21:07:48,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=22.5 2023-12-22 21:07:55,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=783093.3333333334, ans=0.125 2023-12-22 21:07:57,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=783093.3333333334, ans=0.2 2023-12-22 21:07:58,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-22 21:08:10,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=783160.0, ans=0.125 2023-12-22 21:08:12,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=783160.0, ans=0.125 2023-12-22 21:08:15,765 INFO [train.py:886] (3/4) Epoch 25, batch 3100, loss[loss=0.01197, audio_tagging_loss=0.01197, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4958116.14 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:08:23,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=783226.6666666666, ans=0.025 2023-12-22 21:09:08,007 INFO [train.py:886] (3/4) Epoch 25, batch 3150, loss[loss=0.01435, audio_tagging_loss=0.01435, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4945225.60 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:09:13,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=783560.0, ans=0.025 2023-12-22 21:09:13,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=783560.0, ans=0.2 2023-12-22 21:09:14,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.50 vs. limit=5.0 2023-12-22 21:09:26,512 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.818e+01 3.103e+01 3.282e+01 3.438e+01 3.839e+01, threshold=6.565e+01, percent-clipped=0.0 2023-12-22 21:09:26,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=783626.6666666666, ans=0.0 2023-12-22 21:09:43,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=783760.0, ans=0.125 2023-12-22 21:09:54,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-12-22 21:09:57,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=783826.6666666666, ans=0.2 2023-12-22 21:09:59,521 INFO [train.py:886] (3/4) Epoch 25, batch 3200, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4939023.72 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:10:00,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=783893.3333333334, ans=0.125 2023-12-22 21:10:10,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783960.0, ans=0.1 2023-12-22 21:10:12,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=783960.0, ans=0.125 2023-12-22 21:10:17,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2023-12-22 21:10:40,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=784160.0, ans=0.125 2023-12-22 21:10:48,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-12-22 21:10:51,689 INFO [train.py:886] (3/4) Epoch 25, batch 3250, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4942468.62 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:10:52,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-12-22 21:10:57,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=22.5 2023-12-22 21:11:00,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-12-22 21:11:10,154 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+01 3.040e+01 3.210e+01 3.374e+01 3.789e+01, threshold=6.419e+01, percent-clipped=0.0 2023-12-22 21:11:19,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784360.0, ans=0.1 2023-12-22 21:11:44,420 INFO [train.py:886] (3/4) Epoch 25, batch 3300, loss[loss=0.0128, audio_tagging_loss=0.0128, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4944176.71 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:11:48,395 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:12:22,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=784760.0, ans=0.125 2023-12-22 21:12:26,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=784826.6666666666, ans=0.125 2023-12-22 21:12:27,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=784826.6666666666, ans=0.1 2023-12-22 21:12:36,217 INFO [train.py:886] (3/4) Epoch 25, batch 3350, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4950907.15 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:12:36,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=784893.3333333334, ans=0.0 2023-12-22 21:12:51,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-12-22 21:12:54,047 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.707e+01 3.036e+01 3.188e+01 3.323e+01 3.889e+01, threshold=6.376e+01, percent-clipped=0.0 2023-12-22 21:12:56,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=785026.6666666666, ans=0.2 2023-12-22 21:13:04,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-12-22 21:13:05,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=785093.3333333334, ans=0.2 2023-12-22 21:13:06,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=785093.3333333334, ans=0.0 2023-12-22 21:13:27,263 INFO [train.py:886] (3/4) Epoch 25, batch 3400, loss[loss=0.01127, audio_tagging_loss=0.01127, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4950179.77 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:14:19,127 INFO [train.py:886] (3/4) Epoch 25, batch 3450, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4941544.87 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:14:38,181 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.865e+01 3.134e+01 3.253e+01 3.389e+01 3.963e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 21:14:38,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-12-22 21:14:40,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=785693.3333333334, ans=0.0 2023-12-22 21:14:46,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=785693.3333333334, ans=0.04949747468305833 2023-12-22 21:15:09,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=785826.6666666666, ans=0.2 2023-12-22 21:15:11,231 INFO [train.py:886] (3/4) Epoch 25, batch 3500, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4941083.56 frames. ], batch size: 99, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:15:17,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=785893.3333333334, ans=0.2 2023-12-22 21:15:25,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=785960.0, ans=0.2 2023-12-22 21:15:34,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2023-12-22 21:15:39,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2023-12-22 21:15:43,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=12.0 2023-12-22 21:15:51,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786093.3333333334, ans=0.1 2023-12-22 21:16:02,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=786226.6666666666, ans=0.0 2023-12-22 21:16:02,901 INFO [train.py:886] (3/4) Epoch 25, batch 3550, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4945977.45 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:16:17,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=786293.3333333334, ans=0.125 2023-12-22 21:16:19,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=786293.3333333334, ans=0.125 2023-12-22 21:16:22,261 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.633e+01 3.021e+01 3.171e+01 3.367e+01 3.812e+01, threshold=6.343e+01, percent-clipped=0.0 2023-12-22 21:16:47,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-12-22 21:16:54,956 INFO [train.py:886] (3/4) Epoch 25, batch 3600, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4947078.16 frames. ], batch size: 100, lr: 4.31e-03, grad_scale: 64.0 2023-12-22 21:17:15,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-12-22 21:17:24,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=786693.3333333334, ans=0.05 2023-12-22 21:17:46,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=786893.3333333334, ans=0.125 2023-12-22 21:17:47,385 INFO [train.py:886] (3/4) Epoch 25, batch 3650, loss[loss=0.01345, audio_tagging_loss=0.01345, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4954192.12 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:17:51,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=786893.3333333334, ans=0.2 2023-12-22 21:18:05,036 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 2.980e+01 3.189e+01 3.343e+01 3.889e+01, threshold=6.377e+01, percent-clipped=0.0 2023-12-22 21:18:12,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=787026.6666666666, ans=0.125 2023-12-22 21:18:23,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=787093.3333333334, ans=0.0 2023-12-22 21:18:33,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.22 vs. limit=10.0 2023-12-22 21:18:34,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=787160.0, ans=0.0 2023-12-22 21:18:35,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2023-12-22 21:18:35,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2023-12-22 21:18:38,509 INFO [train.py:886] (3/4) Epoch 25, batch 3700, loss[loss=0.01613, audio_tagging_loss=0.01613, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4958611.41 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:18:47,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=787226.6666666666, ans=0.0 2023-12-22 21:18:50,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2023-12-22 21:18:54,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2023-12-22 21:18:57,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=787293.3333333334, ans=0.0 2023-12-22 21:19:02,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-12-22 21:19:10,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2023-12-22 21:19:12,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=787426.6666666666, ans=0.02 2023-12-22 21:19:12,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=787426.6666666666, ans=0.125 2023-12-22 21:19:30,715 INFO [train.py:886] (3/4) Epoch 25, batch 3750, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4955654.13 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:19:47,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=787626.6666666666, ans=0.0 2023-12-22 21:19:49,014 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.099e+01 3.227e+01 3.364e+01 3.864e+01, threshold=6.453e+01, percent-clipped=0.0 2023-12-22 21:20:22,306 INFO [train.py:886] (3/4) Epoch 25, batch 3800, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4951640.57 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:20:34,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-12-22 21:20:51,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.63 vs. limit=22.5 2023-12-22 21:21:07,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-22 21:21:14,609 INFO [train.py:886] (3/4) Epoch 25, batch 3850, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4946439.08 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:21:28,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2023-12-22 21:21:30,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=12.0 2023-12-22 21:21:33,014 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.703e+01 3.115e+01 3.236e+01 3.422e+01 4.867e+01, threshold=6.472e+01, percent-clipped=0.0 2023-12-22 21:21:45,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=788426.6666666666, ans=0.125 2023-12-22 21:22:02,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-22 21:22:06,302 INFO [train.py:886] (3/4) Epoch 25, batch 3900, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4947875.01 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:22:10,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788560.0, ans=0.125 2023-12-22 21:22:10,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=788560.0, ans=0.05 2023-12-22 21:22:22,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=8.0 2023-12-22 21:22:26,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=788693.3333333334, ans=0.125 2023-12-22 21:22:31,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-12-22 21:22:42,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788760.0, ans=0.1 2023-12-22 21:22:43,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=788760.0, ans=0.125 2023-12-22 21:22:47,760 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:22:56,927 INFO [train.py:886] (3/4) Epoch 25, batch 3950, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4951048.96 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:23:11,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=788960.0, ans=0.2 2023-12-22 21:23:16,156 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.672e+01 3.014e+01 3.191e+01 3.353e+01 3.763e+01, threshold=6.383e+01, percent-clipped=0.0 2023-12-22 21:23:42,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=789160.0, ans=0.0 2023-12-22 21:23:48,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=789226.6666666666, ans=0.0 2023-12-22 21:23:49,434 INFO [train.py:886] (3/4) Epoch 25, batch 4000, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4955805.64 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 128.0 2023-12-22 21:24:39,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789493.3333333334, ans=0.125 2023-12-22 21:24:41,226 INFO [train.py:886] (3/4) Epoch 25, batch 4050, loss[loss=0.01464, audio_tagging_loss=0.01464, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4955360.41 frames. ], batch size: 99, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:25:01,300 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.123e+01 3.229e+01 3.371e+01 4.451e+01, threshold=6.458e+01, percent-clipped=0.0 2023-12-22 21:25:14,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=789760.0, ans=0.5 2023-12-22 21:25:21,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=789760.0, ans=0.2 2023-12-22 21:25:33,448 INFO [train.py:886] (3/4) Epoch 25, batch 4100, loss[loss=0.0165, audio_tagging_loss=0.0165, over 21561.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4949306.82 frames. ], batch size: 107, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:25:34,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=789893.3333333334, ans=0.0 2023-12-22 21:25:41,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-22 21:25:48,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=789960.0, ans=0.125 2023-12-22 21:26:24,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790226.6666666666, ans=0.1 2023-12-22 21:26:24,973 INFO [train.py:886] (3/4) Epoch 25, batch 4150, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24925.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4950415.78 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 64.0 2023-12-22 21:26:30,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-12-22 21:26:44,955 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.625e+01 3.070e+01 3.176e+01 3.307e+01 3.814e+01, threshold=6.351e+01, percent-clipped=0.0 2023-12-22 21:26:47,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790360.0, ans=0.125 2023-12-22 21:26:53,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790360.0, ans=0.125 2023-12-22 21:27:15,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790493.3333333334, ans=0.125 2023-12-22 21:27:17,227 INFO [train.py:886] (3/4) Epoch 25, batch 4200, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4955007.19 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:27:22,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790560.0, ans=0.0 2023-12-22 21:27:33,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=790626.6666666666, ans=0.0 2023-12-22 21:27:42,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=790693.3333333334, ans=0.2 2023-12-22 21:27:49,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=790760.0, ans=0.0 2023-12-22 21:27:50,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=790760.0, ans=0.0 2023-12-22 21:27:56,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790760.0, ans=0.125 2023-12-22 21:28:03,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-12-22 21:28:04,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=790826.6666666666, ans=0.05 2023-12-22 21:28:09,328 INFO [train.py:886] (3/4) Epoch 25, batch 4250, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4951374.12 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:28:13,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2023-12-22 21:28:29,339 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.694e+01 3.059e+01 3.182e+01 3.332e+01 3.915e+01, threshold=6.364e+01, percent-clipped=0.0 2023-12-22 21:28:33,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=791026.6666666666, ans=0.125 2023-12-22 21:28:36,287 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.095e-03 2023-12-22 21:28:47,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=12.0 2023-12-22 21:28:48,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=791093.3333333334, ans=0.125 2023-12-22 21:28:57,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=791160.0, ans=0.125 2023-12-22 21:29:01,505 INFO [train.py:886] (3/4) Epoch 25, batch 4300, loss[loss=0.01385, audio_tagging_loss=0.01385, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4952417.28 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:29:04,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=791226.6666666666, ans=0.125 2023-12-22 21:29:13,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=791293.3333333334, ans=0.035 2023-12-22 21:29:17,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-12-22 21:29:35,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=791426.6666666666, ans=0.125 2023-12-22 21:29:37,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-12-22 21:29:52,949 INFO [train.py:886] (3/4) Epoch 25, batch 4350, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4960283.63 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:29:57,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=791560.0, ans=0.2 2023-12-22 21:30:08,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=791626.6666666666, ans=0.0 2023-12-22 21:30:09,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=791626.6666666666, ans=0.04949747468305833 2023-12-22 21:30:12,256 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.736e+01 3.142e+01 3.246e+01 3.434e+01 4.775e+01, threshold=6.491e+01, percent-clipped=0.0 2023-12-22 21:30:35,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=791826.6666666666, ans=0.07 2023-12-22 21:30:39,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=791826.6666666666, ans=0.5 2023-12-22 21:30:44,428 INFO [train.py:886] (3/4) Epoch 25, batch 4400, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4959200.01 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:30:54,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-12-22 21:30:57,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791960.0, ans=0.1 2023-12-22 21:31:04,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=792026.6666666666, ans=0.0 2023-12-22 21:31:06,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-12-22 21:31:07,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=792026.6666666666, ans=0.0 2023-12-22 21:31:16,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=792093.3333333334, ans=0.0 2023-12-22 21:31:17,644 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:31:24,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=792093.3333333334, ans=0.0 2023-12-22 21:31:36,746 INFO [train.py:886] (3/4) Epoch 25, batch 4450, loss[loss=0.01341, audio_tagging_loss=0.01341, over 24750.00 frames. ], tot_loss[loss=0.01324, audio_tagging_loss=0.01324, over 4953698.54 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:31:38,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2023-12-22 21:31:40,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=792226.6666666666, ans=0.0 2023-12-22 21:31:43,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792226.6666666666, ans=0.125 2023-12-22 21:31:47,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=792293.3333333334, ans=0.1 2023-12-22 21:31:55,988 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+01 3.096e+01 3.289e+01 3.454e+01 4.109e+01, threshold=6.578e+01, percent-clipped=0.0 2023-12-22 21:32:05,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=792360.0, ans=0.125 2023-12-22 21:32:17,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=792493.3333333334, ans=0.125 2023-12-22 21:32:28,142 INFO [train.py:886] (3/4) Epoch 25, batch 4500, loss[loss=0.01434, audio_tagging_loss=0.01434, over 22657.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4950208.91 frames. ], batch size: 107, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:32:28,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792560.0, ans=0.125 2023-12-22 21:32:33,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=792560.0, ans=0.0 2023-12-22 21:32:38,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=792626.6666666666, ans=0.04949747468305833 2023-12-22 21:32:44,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=792626.6666666666, ans=0.125 2023-12-22 21:32:47,758 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:32:54,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=792693.3333333334, ans=0.2 2023-12-22 21:32:58,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=792760.0, ans=0.125 2023-12-22 21:33:14,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=792826.6666666666, ans=0.0 2023-12-22 21:33:14,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2023-12-22 21:33:15,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=792826.6666666666, ans=0.125 2023-12-22 21:33:19,562 INFO [train.py:886] (3/4) Epoch 25, batch 4550, loss[loss=0.009449, audio_tagging_loss=0.009449, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4951944.38 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:33:32,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=792960.0, ans=0.0 2023-12-22 21:33:32,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=792960.0, ans=0.2 2023-12-22 21:33:35,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=792960.0, ans=0.0 2023-12-22 21:33:39,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+01 3.079e+01 3.195e+01 3.326e+01 3.977e+01, threshold=6.390e+01, percent-clipped=0.0 2023-12-22 21:33:54,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=793093.3333333334, ans=0.0 2023-12-22 21:33:56,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=793093.3333333334, ans=0.0 2023-12-22 21:34:10,992 INFO [train.py:886] (3/4) Epoch 25, batch 4600, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4955173.52 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:34:13,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=793226.6666666666, ans=0.2 2023-12-22 21:34:17,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=793226.6666666666, ans=0.125 2023-12-22 21:34:39,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=793360.0, ans=0.07 2023-12-22 21:34:41,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=793426.6666666666, ans=0.125 2023-12-22 21:34:57,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=793493.3333333334, ans=0.0 2023-12-22 21:35:01,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=793493.3333333334, ans=0.2 2023-12-22 21:35:02,782 INFO [train.py:886] (3/4) Epoch 25, batch 4650, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4955572.18 frames. ], batch size: 100, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:35:22,747 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+01 3.053e+01 3.192e+01 3.319e+01 4.042e+01, threshold=6.384e+01, percent-clipped=0.0 2023-12-22 21:35:24,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=793693.3333333334, ans=0.125 2023-12-22 21:35:39,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-12-22 21:35:41,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=793760.0, ans=0.1 2023-12-22 21:35:41,387 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:35:53,205 INFO [train.py:886] (3/4) Epoch 25, batch 4700, loss[loss=0.01468, audio_tagging_loss=0.01468, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4954971.13 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 64.0 2023-12-22 21:35:55,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-22 21:35:55,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=793893.3333333334, ans=6.0 2023-12-22 21:36:12,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=794026.6666666666, ans=0.0 2023-12-22 21:36:19,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-12-22 21:36:33,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=794160.0, ans=0.125 2023-12-22 21:36:40,975 INFO [train.py:886] (3/4) Epoch 25, batch 4750, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4953664.85 frames. ], batch size: 99, lr: 4.28e-03, grad_scale: 64.0 2023-12-22 21:36:41,169 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:36:43,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=794226.6666666666, ans=0.125 2023-12-22 21:37:15,784 INFO [train.py:886] (3/4) Epoch 26, batch 0, loss[loss=0.02367, audio_tagging_loss=0.02367, over 24118.00 frames. ], tot_loss[loss=0.02367, audio_tagging_loss=0.02367, over 24118.00 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:37:15,784 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 21:37:37,118 INFO [train.py:917] (3/4) Epoch 26, validation: loss=0.03272, audio_tagging_loss=0.03272, over 3737520.00 frames. 2023-12-22 21:37:37,118 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 21:37:38,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=15.0 2023-12-22 21:37:39,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=15.0 2023-12-22 21:37:41,534 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.775e+01 3.157e+01 3.286e+01 3.436e+01 9.011e+01, threshold=6.571e+01, percent-clipped=3.0 2023-12-22 21:37:44,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=794333.3333333334, ans=0.125 2023-12-22 21:37:47,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=794400.0, ans=0.2 2023-12-22 21:37:57,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=794466.6666666666, ans=0.125 2023-12-22 21:38:14,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2023-12-22 21:38:28,757 INFO [train.py:886] (3/4) Epoch 26, batch 50, loss[loss=0.01532, audio_tagging_loss=0.01532, over 25000.00 frames. ], tot_loss[loss=0.02005, audio_tagging_loss=0.02005, over 1114165.96 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:38:35,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794666.6666666666, ans=0.1 2023-12-22 21:38:35,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=794666.6666666666, ans=0.04949747468305833 2023-12-22 21:38:40,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=794733.3333333334, ans=0.0 2023-12-22 21:38:43,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-12-22 21:39:01,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=794866.6666666666, ans=0.2 2023-12-22 21:39:06,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-12-22 21:39:12,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=794933.3333333334, ans=0.04949747468305833 2023-12-22 21:39:20,286 INFO [train.py:886] (3/4) Epoch 26, batch 100, loss[loss=0.01702, audio_tagging_loss=0.01702, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 1971981.54 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:39:20,463 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:39:24,052 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.579e+01 3.859e+01 4.416e+01 7.347e+01, threshold=7.717e+01, percent-clipped=4.0 2023-12-22 21:39:40,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=795133.3333333334, ans=0.2 2023-12-22 21:39:43,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2023-12-22 21:39:45,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795133.3333333334, ans=0.1 2023-12-22 21:39:59,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795200.0, ans=0.1 2023-12-22 21:40:09,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-12-22 21:40:11,820 INFO [train.py:886] (3/4) Epoch 26, batch 150, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01621, audio_tagging_loss=0.01621, over 2632548.60 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:40:22,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=795400.0, ans=0.0 2023-12-22 21:40:41,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=795466.6666666666, ans=0.2 2023-12-22 21:40:44,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795533.3333333334, ans=0.1 2023-12-22 21:40:46,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-12-22 21:40:53,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795600.0, ans=0.125 2023-12-22 21:40:55,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=795600.0, ans=0.125 2023-12-22 21:40:57,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=795600.0, ans=15.0 2023-12-22 21:40:58,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=795600.0, ans=0.0 2023-12-22 21:41:02,958 INFO [train.py:886] (3/4) Epoch 26, batch 200, loss[loss=0.01336, audio_tagging_loss=0.01336, over 24750.00 frames. ], tot_loss[loss=0.01535, audio_tagging_loss=0.01535, over 3152518.23 frames. ], batch size: 99, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:41:07,378 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.182e+01 3.315e+01 3.522e+01 3.900e+01, threshold=6.631e+01, percent-clipped=0.0 2023-12-22 21:41:33,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=795866.6666666666, ans=0.0 2023-12-22 21:41:34,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=795866.6666666666, ans=0.0 2023-12-22 21:41:36,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-22 21:41:50,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-22 21:41:54,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=796000.0, ans=0.2 2023-12-22 21:41:55,399 INFO [train.py:886] (3/4) Epoch 26, batch 250, loss[loss=0.01505, audio_tagging_loss=0.01505, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 3556451.69 frames. ], batch size: 100, lr: 4.20e-03, grad_scale: 32.0 2023-12-22 21:41:57,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=796000.0, ans=0.09899494936611666 2023-12-22 21:42:26,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796200.0, ans=0.1 2023-12-22 21:42:27,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796200.0, ans=0.1 2023-12-22 21:42:33,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=796200.0, ans=0.125 2023-12-22 21:42:45,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=796266.6666666666, ans=0.125 2023-12-22 21:42:47,245 INFO [train.py:886] (3/4) Epoch 26, batch 300, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.0144, audio_tagging_loss=0.0144, over 3865306.05 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:42:50,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.083e+01 3.251e+01 3.397e+01 4.143e+01, threshold=6.503e+01, percent-clipped=0.0 2023-12-22 21:43:19,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.29 vs. limit=22.5 2023-12-22 21:43:24,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=796533.3333333334, ans=0.125 2023-12-22 21:43:35,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=796600.0, ans=0.125 2023-12-22 21:43:36,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=796600.0, ans=0.0 2023-12-22 21:43:39,415 INFO [train.py:886] (3/4) Epoch 26, batch 350, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 4101650.39 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:44:00,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=796800.0, ans=0.2 2023-12-22 21:44:28,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=796933.3333333334, ans=0.125 2023-12-22 21:44:31,537 INFO [train.py:886] (3/4) Epoch 26, batch 400, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4288061.50 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:44:32,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=797000.0, ans=0.0 2023-12-22 21:44:36,030 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.737e+01 3.039e+01 3.196e+01 3.377e+01 3.798e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 21:44:41,119 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:44:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=797066.6666666666, ans=0.125 2023-12-22 21:44:50,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=797066.6666666666, ans=0.0 2023-12-22 21:45:00,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=797133.3333333334, ans=0.125 2023-12-22 21:45:02,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2023-12-22 21:45:12,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=797266.6666666666, ans=0.2 2023-12-22 21:45:19,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=797266.6666666666, ans=0.025 2023-12-22 21:45:20,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=797266.6666666666, ans=0.125 2023-12-22 21:45:23,755 INFO [train.py:886] (3/4) Epoch 26, batch 450, loss[loss=0.008819, audio_tagging_loss=0.008819, over 24027.00 frames. ], tot_loss[loss=0.01348, audio_tagging_loss=0.01348, over 4436034.97 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:45:33,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=797400.0, ans=0.125 2023-12-22 21:46:03,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=797600.0, ans=0.125 2023-12-22 21:46:14,588 INFO [train.py:886] (3/4) Epoch 26, batch 500, loss[loss=0.009753, audio_tagging_loss=0.009753, over 21517.00 frames. ], tot_loss[loss=0.01327, audio_tagging_loss=0.01327, over 4547880.51 frames. ], batch size: 107, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:46:19,022 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.814e+01 3.074e+01 3.193e+01 3.348e+01 4.490e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 21:46:39,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-12-22 21:46:40,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=797800.0, ans=0.0 2023-12-22 21:47:05,453 INFO [train.py:886] (3/4) Epoch 26, batch 550, loss[loss=0.01452, audio_tagging_loss=0.01452, over 25000.00 frames. ], tot_loss[loss=0.01319, audio_tagging_loss=0.01319, over 4641180.17 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:47:24,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=798066.6666666666, ans=0.2 2023-12-22 21:47:28,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=798133.3333333334, ans=0.125 2023-12-22 21:47:32,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=798133.3333333334, ans=0.0 2023-12-22 21:47:55,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=12.0 2023-12-22 21:47:57,531 INFO [train.py:886] (3/4) Epoch 26, batch 600, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4712975.79 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:48:01,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.107e+01 3.234e+01 3.355e+01 4.218e+01, threshold=6.468e+01, percent-clipped=0.0 2023-12-22 21:48:20,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=798466.6666666666, ans=22.5 2023-12-22 21:48:46,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-12-22 21:48:48,533 INFO [train.py:886] (3/4) Epoch 26, batch 650, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01321, audio_tagging_loss=0.01321, over 4759086.69 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:48:50,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=798666.6666666666, ans=0.125 2023-12-22 21:48:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=798666.6666666666, ans=0.09899494936611666 2023-12-22 21:48:54,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=798666.6666666666, ans=0.0 2023-12-22 21:49:02,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798733.3333333334, ans=0.1 2023-12-22 21:49:04,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2023-12-22 21:49:19,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-12-22 21:49:33,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2023-12-22 21:49:34,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=798933.3333333334, ans=0.0 2023-12-22 21:49:38,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798933.3333333334, ans=0.125 2023-12-22 21:49:39,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=799000.0, ans=0.09899494936611666 2023-12-22 21:49:40,193 INFO [train.py:886] (3/4) Epoch 26, batch 700, loss[loss=0.01567, audio_tagging_loss=0.01567, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 4795060.31 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:49:43,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.720e+01 3.098e+01 3.257e+01 3.432e+01 3.751e+01, threshold=6.513e+01, percent-clipped=0.0 2023-12-22 21:50:28,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=799266.6666666666, ans=0.0 2023-12-22 21:50:30,733 INFO [train.py:886] (3/4) Epoch 26, batch 750, loss[loss=0.01263, audio_tagging_loss=0.01263, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4831230.83 frames. ], batch size: 99, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:50:31,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=799333.3333333334, ans=0.125 2023-12-22 21:50:38,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=799333.3333333334, ans=0.125 2023-12-22 21:50:38,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=799333.3333333334, ans=0.1 2023-12-22 21:50:38,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=799333.3333333334, ans=0.125 2023-12-22 21:50:57,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.62 vs. limit=10.0 2023-12-22 21:50:59,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2023-12-22 21:51:01,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=799533.3333333334, ans=0.0 2023-12-22 21:51:05,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=799533.3333333334, ans=0.0 2023-12-22 21:51:18,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799600.0, ans=0.1 2023-12-22 21:51:22,653 INFO [train.py:886] (3/4) Epoch 26, batch 800, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4857237.78 frames. ], batch size: 100, lr: 4.19e-03, grad_scale: 32.0 2023-12-22 21:51:26,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.081e+01 3.211e+01 3.380e+01 3.889e+01, threshold=6.421e+01, percent-clipped=0.0 2023-12-22 21:51:38,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=799733.3333333334, ans=0.125 2023-12-22 21:51:53,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-12-22 21:51:54,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=799866.6666666666, ans=0.125 2023-12-22 21:51:58,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=799866.6666666666, ans=0.0 2023-12-22 21:52:11,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=799933.3333333334, ans=0.125 2023-12-22 21:52:17,738 INFO [train.py:886] (3/4) Epoch 26, batch 850, loss[loss=0.01441, audio_tagging_loss=0.01441, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4878221.47 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:52:21,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.68 vs. limit=22.5 2023-12-22 21:52:26,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800000.0, ans=0.1 2023-12-22 21:52:40,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=800133.3333333334, ans=0.125 2023-12-22 21:52:57,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=800200.0, ans=0.125 2023-12-22 21:53:08,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2023-12-22 21:53:08,747 INFO [train.py:886] (3/4) Epoch 26, batch 900, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4899655.79 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:53:13,186 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.047e+01 3.202e+01 3.314e+01 4.091e+01, threshold=6.405e+01, percent-clipped=0.0 2023-12-22 21:53:33,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-12-22 21:53:37,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=800466.6666666666, ans=0.0 2023-12-22 21:53:49,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=800533.3333333334, ans=0.125 2023-12-22 21:53:56,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=800600.0, ans=0.0 2023-12-22 21:53:58,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=800600.0, ans=0.0 2023-12-22 21:54:01,654 INFO [train.py:886] (3/4) Epoch 26, batch 950, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4907900.30 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:23,007 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 21:54:42,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800933.3333333334, ans=0.1 2023-12-22 21:54:50,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=800933.3333333334, ans=15.0 2023-12-22 21:54:53,023 INFO [train.py:886] (3/4) Epoch 26, batch 1000, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4909708.24 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:54:56,817 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.055e+01 3.210e+01 3.341e+01 4.117e+01, threshold=6.420e+01, percent-clipped=0.0 2023-12-22 21:55:18,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=801133.3333333334, ans=0.0 2023-12-22 21:55:24,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=801200.0, ans=0.05 2023-12-22 21:55:27,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=801200.0, ans=0.07 2023-12-22 21:55:41,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=801266.6666666666, ans=0.125 2023-12-22 21:55:43,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=801333.3333333334, ans=0.2 2023-12-22 21:55:43,852 INFO [train.py:886] (3/4) Epoch 26, batch 1050, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4920124.13 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:55:46,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=801333.3333333334, ans=0.95 2023-12-22 21:55:59,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801400.0, ans=0.125 2023-12-22 21:56:05,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=801466.6666666666, ans=0.125 2023-12-22 21:56:17,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=801533.3333333334, ans=0.125 2023-12-22 21:56:33,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=801600.0, ans=0.0 2023-12-22 21:56:36,432 INFO [train.py:886] (3/4) Epoch 26, batch 1100, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4933937.31 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:56:40,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.668e+01 3.073e+01 3.229e+01 3.408e+01 3.887e+01, threshold=6.457e+01, percent-clipped=0.0 2023-12-22 21:56:42,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=22.5 2023-12-22 21:56:52,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=801733.3333333334, ans=0.125 2023-12-22 21:56:54,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=801733.3333333334, ans=0.125 2023-12-22 21:56:59,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.56 vs. limit=10.0 2023-12-22 21:57:04,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=801800.0, ans=0.125 2023-12-22 21:57:08,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=801866.6666666666, ans=0.05 2023-12-22 21:57:11,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801866.6666666666, ans=0.1 2023-12-22 21:57:21,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=801933.3333333334, ans=0.2 2023-12-22 21:57:25,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=801933.3333333334, ans=0.2 2023-12-22 21:57:27,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-12-22 21:57:28,142 INFO [train.py:886] (3/4) Epoch 26, batch 1150, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4939545.67 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:58:08,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=802266.6666666666, ans=0.125 2023-12-22 21:58:18,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=802266.6666666666, ans=10.0 2023-12-22 21:58:20,319 INFO [train.py:886] (3/4) Epoch 26, batch 1200, loss[loss=0.01298, audio_tagging_loss=0.01298, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4950158.49 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:58:23,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=802333.3333333334, ans=0.125 2023-12-22 21:58:24,028 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.663e+01 3.101e+01 3.239e+01 3.403e+01 4.197e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 21:58:33,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=802400.0, ans=0.125 2023-12-22 21:58:39,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802466.6666666666, ans=0.1 2023-12-22 21:58:49,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=802533.3333333334, ans=0.125 2023-12-22 21:59:01,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=802600.0, ans=0.09899494936611666 2023-12-22 21:59:11,800 INFO [train.py:886] (3/4) Epoch 26, batch 1250, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4948702.20 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 21:59:18,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=802666.6666666666, ans=12.0 2023-12-22 21:59:33,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=802800.0, ans=0.125 2023-12-22 21:59:45,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=802866.6666666666, ans=0.07 2023-12-22 21:59:51,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=802866.6666666666, ans=0.0 2023-12-22 22:00:03,357 INFO [train.py:886] (3/4) Epoch 26, batch 1300, loss[loss=0.01369, audio_tagging_loss=0.01369, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4942599.47 frames. ], batch size: 99, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:00:07,853 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.752e+01 3.138e+01 3.281e+01 3.471e+01 4.385e+01, threshold=6.561e+01, percent-clipped=0.0 2023-12-22 22:00:26,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=803133.3333333334, ans=0.125 2023-12-22 22:00:38,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=8.0 2023-12-22 22:00:42,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=803200.0, ans=0.0 2023-12-22 22:00:55,950 INFO [train.py:886] (3/4) Epoch 26, batch 1350, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4948477.25 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:05,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803400.0, ans=0.1 2023-12-22 22:01:20,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=803466.6666666666, ans=0.125 2023-12-22 22:01:46,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-12-22 22:01:48,097 INFO [train.py:886] (3/4) Epoch 26, batch 1400, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4952128.22 frames. ], batch size: 100, lr: 4.18e-03, grad_scale: 32.0 2023-12-22 22:01:51,854 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+01 3.025e+01 3.160e+01 3.304e+01 3.712e+01, threshold=6.320e+01, percent-clipped=0.0 2023-12-22 22:01:53,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=803666.6666666666, ans=0.125 2023-12-22 22:02:11,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803800.0, ans=0.1 2023-12-22 22:02:17,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=803866.6666666666, ans=0.2 2023-12-22 22:02:28,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=803933.3333333334, ans=0.2 2023-12-22 22:02:36,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-22 22:02:39,168 INFO [train.py:886] (3/4) Epoch 26, batch 1450, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4955426.24 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:03:22,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=804266.6666666666, ans=0.125 2023-12-22 22:03:25,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2023-12-22 22:03:31,173 INFO [train.py:886] (3/4) Epoch 26, batch 1500, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4959870.90 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:03:31,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=804333.3333333334, ans=0.125 2023-12-22 22:03:33,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=804333.3333333334, ans=0.2 2023-12-22 22:03:35,697 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.046e+01 3.201e+01 3.307e+01 3.722e+01, threshold=6.402e+01, percent-clipped=0.0 2023-12-22 22:03:36,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-12-22 22:03:47,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=804400.0, ans=0.0 2023-12-22 22:03:50,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-12-22 22:04:11,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=804600.0, ans=0.125 2023-12-22 22:04:17,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804600.0, ans=0.125 2023-12-22 22:04:23,289 INFO [train.py:886] (3/4) Epoch 26, batch 1550, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4953561.97 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:04:31,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=804666.6666666666, ans=0.125 2023-12-22 22:04:39,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=804733.3333333334, ans=0.2 2023-12-22 22:04:45,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804800.0, ans=0.1 2023-12-22 22:05:13,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=805000.0, ans=0.2 2023-12-22 22:05:14,680 INFO [train.py:886] (3/4) Epoch 26, batch 1600, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24039.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4943274.02 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:05:18,338 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.137e+01 3.270e+01 3.392e+01 3.802e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 22:05:25,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=805066.6666666666, ans=0.125 2023-12-22 22:05:27,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=805066.6666666666, ans=0.0 2023-12-22 22:05:30,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=805066.6666666666, ans=0.125 2023-12-22 22:05:31,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=805066.6666666666, ans=0.125 2023-12-22 22:06:04,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=805266.6666666666, ans=0.2 2023-12-22 22:06:07,026 INFO [train.py:886] (3/4) Epoch 26, batch 1650, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4937576.96 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:06:20,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=805400.0, ans=0.125 2023-12-22 22:06:41,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=805533.3333333334, ans=0.125 2023-12-22 22:06:42,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=805533.3333333334, ans=0.0 2023-12-22 22:06:47,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-12-22 22:06:56,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=805600.0, ans=0.0 2023-12-22 22:06:57,931 INFO [train.py:886] (3/4) Epoch 26, batch 1700, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4946949.57 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:07:02,417 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+01 3.061e+01 3.187e+01 3.330e+01 3.966e+01, threshold=6.373e+01, percent-clipped=0.0 2023-12-22 22:07:22,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2023-12-22 22:07:30,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805866.6666666666, ans=0.1 2023-12-22 22:07:41,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2023-12-22 22:07:51,012 INFO [train.py:886] (3/4) Epoch 26, batch 1750, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4950898.92 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:07:51,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=806000.0, ans=0.125 2023-12-22 22:07:56,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-22 22:08:11,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=806133.3333333334, ans=0.125 2023-12-22 22:08:35,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=806266.6666666666, ans=0.2 2023-12-22 22:08:41,209 INFO [train.py:886] (3/4) Epoch 26, batch 1800, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4949659.80 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:08:45,752 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.085e+01 3.206e+01 3.379e+01 3.984e+01, threshold=6.413e+01, percent-clipped=0.0 2023-12-22 22:08:50,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=806400.0, ans=0.0 2023-12-22 22:09:30,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806600.0, ans=0.1 2023-12-22 22:09:31,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=806600.0, ans=0.0 2023-12-22 22:09:33,099 INFO [train.py:886] (3/4) Epoch 26, batch 1850, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4951126.94 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:09:54,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=806800.0, ans=0.125 2023-12-22 22:10:19,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=806933.3333333334, ans=0.09899494936611666 2023-12-22 22:10:20,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806933.3333333334, ans=0.1 2023-12-22 22:10:23,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=806933.3333333334, ans=0.125 2023-12-22 22:10:25,614 INFO [train.py:886] (3/4) Epoch 26, batch 1900, loss[loss=0.01579, audio_tagging_loss=0.01579, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4945001.29 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:10:30,061 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.174e+01 3.329e+01 3.459e+01 4.807e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-22 22:10:30,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=807000.0, ans=0.0 2023-12-22 22:10:38,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.0 2023-12-22 22:10:39,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=807066.6666666666, ans=0.125 2023-12-22 22:10:40,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807066.6666666666, ans=0.125 2023-12-22 22:11:03,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=807200.0, ans=0.05 2023-12-22 22:11:16,640 INFO [train.py:886] (3/4) Epoch 26, batch 1950, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4935302.90 frames. ], batch size: 99, lr: 4.17e-03, grad_scale: 32.0 2023-12-22 22:11:17,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=807333.3333333334, ans=0.125 2023-12-22 22:11:19,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=807333.3333333334, ans=0.1 2023-12-22 22:11:23,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=807333.3333333334, ans=0.125 2023-12-22 22:11:34,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=807400.0, ans=0.125 2023-12-22 22:11:36,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=807400.0, ans=0.0 2023-12-22 22:11:44,391 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:12:04,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=807600.0, ans=0.0 2023-12-22 22:12:09,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=807666.6666666666, ans=6.0 2023-12-22 22:12:10,194 INFO [train.py:886] (3/4) Epoch 26, batch 2000, loss[loss=0.0108, audio_tagging_loss=0.0108, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4939700.87 frames. ], batch size: 100, lr: 4.17e-03, grad_scale: 64.0 2023-12-22 22:12:14,053 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.056e+01 3.196e+01 3.415e+01 4.184e+01, threshold=6.392e+01, percent-clipped=0.0 2023-12-22 22:12:27,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=807733.3333333334, ans=0.1 2023-12-22 22:12:27,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=807733.3333333334, ans=0.125 2023-12-22 22:12:28,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=807733.3333333334, ans=0.125 2023-12-22 22:12:31,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807800.0, ans=0.125 2023-12-22 22:12:42,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-12-22 22:12:55,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807933.3333333334, ans=0.1 2023-12-22 22:13:02,227 INFO [train.py:886] (3/4) Epoch 26, batch 2050, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4945610.91 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:26,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808133.3333333334, ans=0.125 2023-12-22 22:13:28,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-12-22 22:13:30,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=808133.3333333334, ans=0.1 2023-12-22 22:13:41,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=808200.0, ans=0.125 2023-12-22 22:13:41,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-22 22:13:43,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=808266.6666666666, ans=0.125 2023-12-22 22:13:46,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=808266.6666666666, ans=0.2 2023-12-22 22:13:47,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.51 vs. limit=5.0 2023-12-22 22:13:53,230 INFO [train.py:886] (3/4) Epoch 26, batch 2100, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4953694.08 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:13:56,968 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.782e+01 3.109e+01 3.258e+01 3.404e+01 4.082e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 22:14:26,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=808533.3333333334, ans=0.0 2023-12-22 22:14:33,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=808600.0, ans=0.07 2023-12-22 22:14:42,082 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.453e-02 2023-12-22 22:14:44,686 INFO [train.py:886] (3/4) Epoch 26, batch 2150, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4951333.08 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:14:46,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=808666.6666666666, ans=0.0 2023-12-22 22:14:56,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=808733.3333333334, ans=0.0 2023-12-22 22:15:15,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=808866.6666666666, ans=0.0 2023-12-22 22:15:17,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808866.6666666666, ans=0.1 2023-12-22 22:15:23,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=808866.6666666666, ans=0.1 2023-12-22 22:15:36,307 INFO [train.py:886] (3/4) Epoch 26, batch 2200, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4948241.45 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:15:40,872 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.112e+01 3.281e+01 3.454e+01 3.979e+01, threshold=6.563e+01, percent-clipped=0.0 2023-12-22 22:16:06,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=809200.0, ans=0.2 2023-12-22 22:16:08,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-22 22:16:14,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=809200.0, ans=0.125 2023-12-22 22:16:14,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=809200.0, ans=0.2 2023-12-22 22:16:16,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.73 vs. limit=10.0 2023-12-22 22:16:19,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=809266.6666666666, ans=0.1 2023-12-22 22:16:22,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=15.0 2023-12-22 22:16:28,681 INFO [train.py:886] (3/4) Epoch 26, batch 2250, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24078.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4939177.46 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:16:40,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=809400.0, ans=0.2 2023-12-22 22:16:45,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=809400.0, ans=0.125 2023-12-22 22:16:46,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=809400.0, ans=0.95 2023-12-22 22:16:54,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=809466.6666666666, ans=0.125 2023-12-22 22:17:20,279 INFO [train.py:886] (3/4) Epoch 26, batch 2300, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4941904.86 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:17:21,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=809666.6666666666, ans=0.125 2023-12-22 22:17:24,684 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.740e+01 3.089e+01 3.215e+01 3.416e+01 4.133e+01, threshold=6.430e+01, percent-clipped=0.0 2023-12-22 22:17:38,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=809733.3333333334, ans=0.0 2023-12-22 22:17:41,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=15.0 2023-12-22 22:17:44,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2023-12-22 22:17:46,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=809800.0, ans=0.125 2023-12-22 22:17:47,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=809800.0, ans=0.125 2023-12-22 22:18:10,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=809933.3333333334, ans=0.125 2023-12-22 22:18:11,851 INFO [train.py:886] (3/4) Epoch 26, batch 2350, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4947694.37 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:19:03,982 INFO [train.py:886] (3/4) Epoch 26, batch 2400, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4947908.21 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:19:07,784 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.847e+01 3.068e+01 3.224e+01 3.358e+01 4.468e+01, threshold=6.448e+01, percent-clipped=0.0 2023-12-22 22:19:16,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=810400.0, ans=0.2 2023-12-22 22:19:34,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-12-22 22:19:46,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=810600.0, ans=0.125 2023-12-22 22:19:56,220 INFO [train.py:886] (3/4) Epoch 26, batch 2450, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4954329.05 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:08,295 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:20:19,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=810800.0, ans=15.0 2023-12-22 22:20:31,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=810866.6666666666, ans=0.0 2023-12-22 22:20:32,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-12-22 22:20:47,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=811000.0, ans=0.125 2023-12-22 22:20:47,798 INFO [train.py:886] (3/4) Epoch 26, batch 2500, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4947083.23 frames. ], batch size: 99, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:20:51,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=811000.0, ans=0.0 2023-12-22 22:20:52,293 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.780e+01 3.125e+01 3.301e+01 3.409e+01 3.789e+01, threshold=6.601e+01, percent-clipped=0.0 2023-12-22 22:20:57,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=811066.6666666666, ans=0.0 2023-12-22 22:21:02,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=811066.6666666666, ans=0.0 2023-12-22 22:21:14,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=811133.3333333334, ans=0.0 2023-12-22 22:21:38,991 INFO [train.py:886] (3/4) Epoch 26, batch 2550, loss[loss=0.01779, audio_tagging_loss=0.01779, over 24941.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4939400.35 frames. ], batch size: 100, lr: 4.16e-03, grad_scale: 64.0 2023-12-22 22:21:42,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=811333.3333333334, ans=0.5 2023-12-22 22:21:46,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=811333.3333333334, ans=0.2 2023-12-22 22:21:51,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=811400.0, ans=0.125 2023-12-22 22:22:20,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-12-22 22:22:22,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=811600.0, ans=0.125 2023-12-22 22:22:28,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=811600.0, ans=0.0 2023-12-22 22:22:30,087 INFO [train.py:886] (3/4) Epoch 26, batch 2600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4940585.72 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:22:30,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811666.6666666666, ans=0.1 2023-12-22 22:22:35,136 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.819e+01 3.115e+01 3.236e+01 3.408e+01 3.889e+01, threshold=6.471e+01, percent-clipped=0.0 2023-12-22 22:22:38,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=811666.6666666666, ans=0.2 2023-12-22 22:22:47,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=811733.3333333334, ans=0.125 2023-12-22 22:23:00,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=811866.6666666666, ans=15.0 2023-12-22 22:23:11,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=811933.3333333334, ans=0.0 2023-12-22 22:23:12,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=811933.3333333334, ans=0.035 2023-12-22 22:23:12,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-12-22 22:23:22,385 INFO [train.py:886] (3/4) Epoch 26, batch 2650, loss[loss=0.01175, audio_tagging_loss=0.01175, over 23974.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4940884.38 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:23:33,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812066.6666666666, ans=0.1 2023-12-22 22:23:33,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=812066.6666666666, ans=0.1 2023-12-22 22:23:51,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-12-22 22:23:52,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=812200.0, ans=0.125 2023-12-22 22:23:56,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812200.0, ans=0.1 2023-12-22 22:24:06,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=812266.6666666666, ans=0.0 2023-12-22 22:24:14,622 INFO [train.py:886] (3/4) Epoch 26, batch 2700, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4948693.24 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:24:14,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812333.3333333334, ans=0.1 2023-12-22 22:24:18,370 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.764e+01 3.107e+01 3.255e+01 3.402e+01 3.998e+01, threshold=6.509e+01, percent-clipped=0.0 2023-12-22 22:24:25,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2023-12-22 22:25:05,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2023-12-22 22:25:05,571 INFO [train.py:886] (3/4) Epoch 26, batch 2750, loss[loss=0.01456, audio_tagging_loss=0.01456, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4953355.79 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:25:06,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=812666.6666666666, ans=0.07 2023-12-22 22:25:08,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=812666.6666666666, ans=0.125 2023-12-22 22:25:31,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812800.0, ans=0.1 2023-12-22 22:25:39,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=812866.6666666666, ans=0.125 2023-12-22 22:25:41,220 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:25:58,408 INFO [train.py:886] (3/4) Epoch 26, batch 2800, loss[loss=0.01583, audio_tagging_loss=0.01583, over 24949.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4949448.81 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:01,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813000.0, ans=0.1 2023-12-22 22:26:02,083 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.802e+01 3.113e+01 3.268e+01 3.447e+01 3.793e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 22:26:04,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=813000.0, ans=0.2 2023-12-22 22:26:06,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=813000.0, ans=0.125 2023-12-22 22:26:13,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2023-12-22 22:26:16,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.25 vs. limit=10.0 2023-12-22 22:26:30,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-12-22 22:26:38,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=813266.6666666666, ans=0.125 2023-12-22 22:26:42,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=813266.6666666666, ans=0.0 2023-12-22 22:26:43,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=813266.6666666666, ans=0.0 2023-12-22 22:26:48,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2023-12-22 22:26:48,960 INFO [train.py:886] (3/4) Epoch 26, batch 2850, loss[loss=0.01347, audio_tagging_loss=0.01347, over 24750.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4948538.15 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:26:52,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=813333.3333333334, ans=0.0 2023-12-22 22:26:57,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-12-22 22:27:40,977 INFO [train.py:886] (3/4) Epoch 26, batch 2900, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24117.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4943228.67 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:27:44,731 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.068e+01 3.241e+01 3.417e+01 3.879e+01, threshold=6.482e+01, percent-clipped=0.0 2023-12-22 22:27:51,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=813733.3333333334, ans=0.0 2023-12-22 22:28:00,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-12-22 22:28:06,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=813800.0, ans=22.5 2023-12-22 22:28:18,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-12-22 22:28:29,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=813933.3333333334, ans=0.2 2023-12-22 22:28:33,264 INFO [train.py:886] (3/4) Epoch 26, batch 2950, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4946044.84 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:28:45,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2023-12-22 22:28:52,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2023-12-22 22:29:24,367 INFO [train.py:886] (3/4) Epoch 26, batch 3000, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4941143.14 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:29:24,367 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 22:29:38,347 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1600, 1.0372, 4.4791, 4.3803], device='cuda:3') 2023-12-22 22:29:41,152 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5858, 2.9199, 4.2519, 3.9223], device='cuda:3') 2023-12-22 22:29:45,066 INFO [train.py:917] (3/4) Epoch 26, validation: loss=0.03227, audio_tagging_loss=0.03227, over 3737520.00 frames. 2023-12-22 22:29:45,067 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 22:29:48,817 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.069e+01 3.220e+01 3.381e+01 3.833e+01, threshold=6.439e+01, percent-clipped=0.0 2023-12-22 22:29:49,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-12-22 22:30:01,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814400.0, ans=0.1 2023-12-22 22:30:23,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=814533.3333333334, ans=0.125 2023-12-22 22:30:36,603 INFO [train.py:886] (3/4) Epoch 26, batch 3050, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4945061.02 frames. ], batch size: 100, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:30:42,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=814666.6666666666, ans=0.125 2023-12-22 22:30:46,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=814733.3333333334, ans=0.0 2023-12-22 22:30:47,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=814733.3333333334, ans=0.0 2023-12-22 22:30:58,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814800.0, ans=0.125 2023-12-22 22:31:06,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814800.0, ans=0.125 2023-12-22 22:31:08,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=814866.6666666666, ans=0.0 2023-12-22 22:31:09,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=814866.6666666666, ans=0.09899494936611666 2023-12-22 22:31:28,245 INFO [train.py:886] (3/4) Epoch 26, batch 3100, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4952845.82 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:31:32,716 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.774e+01 3.119e+01 3.253e+01 3.438e+01 3.800e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 22:31:39,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=815066.6666666666, ans=0.0 2023-12-22 22:31:42,240 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:31:44,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=815066.6666666666, ans=0.125 2023-12-22 22:32:04,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=815200.0, ans=0.0 2023-12-22 22:32:12,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=815266.6666666666, ans=0.07 2023-12-22 22:32:21,418 INFO [train.py:886] (3/4) Epoch 26, batch 3150, loss[loss=0.01546, audio_tagging_loss=0.01546, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4954777.81 frames. ], batch size: 99, lr: 4.15e-03, grad_scale: 64.0 2023-12-22 22:33:01,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=815600.0, ans=0.125 2023-12-22 22:33:03,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=815600.0, ans=0.125 2023-12-22 22:33:06,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=815600.0, ans=0.0 2023-12-22 22:33:13,056 INFO [train.py:886] (3/4) Epoch 26, batch 3200, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4946220.59 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:33:16,942 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.137e+01 3.242e+01 3.409e+01 4.105e+01, threshold=6.485e+01, percent-clipped=0.0 2023-12-22 22:33:20,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815666.6666666666, ans=0.1 2023-12-22 22:33:33,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815800.0, ans=0.1 2023-12-22 22:33:56,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=815933.3333333334, ans=0.0 2023-12-22 22:33:57,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-12-22 22:34:04,553 INFO [train.py:886] (3/4) Epoch 26, batch 3250, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4944364.71 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:34:14,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-22 22:34:17,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=816066.6666666666, ans=0.125 2023-12-22 22:34:19,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816066.6666666666, ans=0.1 2023-12-22 22:34:21,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=816066.6666666666, ans=0.2 2023-12-22 22:34:29,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-22 22:34:55,454 INFO [train.py:886] (3/4) Epoch 26, batch 3300, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4948035.80 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:35:00,011 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.721e+01 3.062e+01 3.233e+01 3.383e+01 3.983e+01, threshold=6.466e+01, percent-clipped=0.0 2023-12-22 22:35:25,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-12-22 22:35:26,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816533.3333333334, ans=0.125 2023-12-22 22:35:31,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=816533.3333333334, ans=0.2 2023-12-22 22:35:37,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-12-22 22:35:40,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2023-12-22 22:35:47,756 INFO [train.py:886] (3/4) Epoch 26, batch 3350, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4952107.33 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:35:49,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816666.6666666666, ans=0.1 2023-12-22 22:35:55,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=816666.6666666666, ans=0.0 2023-12-22 22:36:21,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-12-22 22:36:21,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.93 vs. limit=22.5 2023-12-22 22:36:28,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=816933.3333333334, ans=0.0 2023-12-22 22:36:39,803 INFO [train.py:886] (3/4) Epoch 26, batch 3400, loss[loss=0.01364, audio_tagging_loss=0.01364, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4953021.29 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:36:44,326 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.878e+01 3.135e+01 3.242e+01 3.478e+01 3.841e+01, threshold=6.483e+01, percent-clipped=0.0 2023-12-22 22:36:47,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=12.0 2023-12-22 22:36:50,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-12-22 22:37:05,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=817133.3333333334, ans=0.0 2023-12-22 22:37:12,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=817200.0, ans=0.04949747468305833 2023-12-22 22:37:19,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=817200.0, ans=0.125 2023-12-22 22:37:20,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=817266.6666666666, ans=0.125 2023-12-22 22:37:21,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=817266.6666666666, ans=0.035 2023-12-22 22:37:24,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817266.6666666666, ans=0.125 2023-12-22 22:37:31,871 INFO [train.py:886] (3/4) Epoch 26, batch 3450, loss[loss=0.01101, audio_tagging_loss=0.01101, over 24750.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4956747.15 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:37:41,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817400.0, ans=0.1 2023-12-22 22:37:43,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=817400.0, ans=0.125 2023-12-22 22:37:46,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-22 22:37:57,096 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.349e-02 2023-12-22 22:38:23,642 INFO [train.py:886] (3/4) Epoch 26, batch 3500, loss[loss=0.01422, audio_tagging_loss=0.01422, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4948867.70 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:38:28,156 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.949e+01 3.137e+01 3.263e+01 3.427e+01 4.088e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:38:36,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=817733.3333333334, ans=0.125 2023-12-22 22:38:39,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=817733.3333333334, ans=0.0 2023-12-22 22:38:42,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=817733.3333333334, ans=0.0 2023-12-22 22:38:48,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=817800.0, ans=0.2 2023-12-22 22:38:53,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817800.0, ans=0.0 2023-12-22 22:38:58,823 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:39:03,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=817866.6666666666, ans=0.125 2023-12-22 22:39:06,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2023-12-22 22:39:09,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2023-12-22 22:39:15,353 INFO [train.py:886] (3/4) Epoch 26, batch 3550, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4947791.48 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:39:34,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=818066.6666666666, ans=0.1 2023-12-22 22:39:37,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-22 22:40:04,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=818266.6666666666, ans=0.125 2023-12-22 22:40:08,439 INFO [train.py:886] (3/4) Epoch 26, batch 3600, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4952648.29 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:40:08,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=818333.3333333334, ans=0.025 2023-12-22 22:40:10,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=818333.3333333334, ans=0.125 2023-12-22 22:40:12,302 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.685e+01 3.075e+01 3.238e+01 3.372e+01 3.764e+01, threshold=6.477e+01, percent-clipped=0.0 2023-12-22 22:40:13,485 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:40:17,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-12-22 22:40:46,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=818533.3333333334, ans=0.2 2023-12-22 22:41:00,179 INFO [train.py:886] (3/4) Epoch 26, batch 3650, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4951601.07 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:00,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=818666.6666666666, ans=0.125 2023-12-22 22:41:05,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=818666.6666666666, ans=0.0 2023-12-22 22:41:06,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=818666.6666666666, ans=0.07 2023-12-22 22:41:18,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=818733.3333333334, ans=0.1 2023-12-22 22:41:48,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=818933.3333333334, ans=0.125 2023-12-22 22:41:52,788 INFO [train.py:886] (3/4) Epoch 26, batch 3700, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4956385.22 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:41:56,505 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+01 3.107e+01 3.222e+01 3.395e+01 4.051e+01, threshold=6.444e+01, percent-clipped=0.0 2023-12-22 22:41:57,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-12-22 22:42:04,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819066.6666666666, ans=0.0 2023-12-22 22:42:08,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=819066.6666666666, ans=0.125 2023-12-22 22:42:08,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2023-12-22 22:42:16,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-12-22 22:42:19,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=819133.3333333334, ans=0.125 2023-12-22 22:42:22,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2023-12-22 22:42:23,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819200.0, ans=0.0 2023-12-22 22:42:43,623 INFO [train.py:886] (3/4) Epoch 26, batch 3750, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4951069.26 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 64.0 2023-12-22 22:42:50,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=819333.3333333334, ans=0.2 2023-12-22 22:42:51,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819333.3333333334, ans=0.1 2023-12-22 22:42:53,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=819333.3333333334, ans=0.125 2023-12-22 22:42:59,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-12-22 22:43:18,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2023-12-22 22:43:28,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=819600.0, ans=0.125 2023-12-22 22:43:35,751 INFO [train.py:886] (3/4) Epoch 26, batch 3800, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4947550.13 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:43:39,545 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+01 3.124e+01 3.288e+01 3.411e+01 4.142e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-22 22:43:55,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=819733.3333333334, ans=0.2 2023-12-22 22:44:02,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=819800.0, ans=0.125 2023-12-22 22:44:11,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2023-12-22 22:44:13,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=819866.6666666666, ans=0.0 2023-12-22 22:44:25,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-12-22 22:44:28,477 INFO [train.py:886] (3/4) Epoch 26, batch 3850, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24074.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4946392.05 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:44:46,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=820066.6666666666, ans=0.125 2023-12-22 22:45:13,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=820266.6666666666, ans=0.5 2023-12-22 22:45:19,631 INFO [train.py:886] (3/4) Epoch 26, batch 3900, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4952307.06 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:45:23,391 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+01 3.084e+01 3.268e+01 3.397e+01 4.255e+01, threshold=6.537e+01, percent-clipped=0.0 2023-12-22 22:45:23,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=820333.3333333334, ans=0.0 2023-12-22 22:45:24,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=820333.3333333334, ans=0.2 2023-12-22 22:45:44,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=820466.6666666666, ans=0.125 2023-12-22 22:45:54,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=820533.3333333334, ans=0.125 2023-12-22 22:46:11,691 INFO [train.py:886] (3/4) Epoch 26, batch 3950, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4952410.31 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:46:14,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-22 22:46:23,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2023-12-22 22:46:41,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=820800.0, ans=0.125 2023-12-22 22:46:49,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=820866.6666666666, ans=0.0 2023-12-22 22:46:58,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=12.0 2023-12-22 22:47:02,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=821000.0, ans=0.125 2023-12-22 22:47:03,346 INFO [train.py:886] (3/4) Epoch 26, batch 4000, loss[loss=0.01445, audio_tagging_loss=0.01445, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4953427.30 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 128.0 2023-12-22 22:47:07,817 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.131e+01 3.262e+01 3.397e+01 4.571e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-22 22:47:09,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=821000.0, ans=0.125 2023-12-22 22:47:13,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-22 22:47:14,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=821066.6666666666, ans=0.125 2023-12-22 22:47:16,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=821066.6666666666, ans=0.2 2023-12-22 22:47:42,782 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:47:55,603 INFO [train.py:886] (3/4) Epoch 26, batch 4050, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4958146.29 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:47:58,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-12-22 22:48:24,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=821466.6666666666, ans=0.0 2023-12-22 22:48:33,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=821533.3333333334, ans=0.125 2023-12-22 22:48:47,401 INFO [train.py:886] (3/4) Epoch 26, batch 4100, loss[loss=0.01714, audio_tagging_loss=0.01714, over 24944.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4948645.85 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:48:52,856 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.137e+01 3.284e+01 3.434e+01 4.127e+01, threshold=6.569e+01, percent-clipped=0.0 2023-12-22 22:48:58,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=821733.3333333334, ans=0.125 2023-12-22 22:49:03,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=821733.3333333334, ans=0.125 2023-12-22 22:49:38,959 INFO [train.py:886] (3/4) Epoch 26, batch 4150, loss[loss=0.01052, audio_tagging_loss=0.01052, over 25000.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4946296.24 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:49:39,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=822000.0, ans=0.125 2023-12-22 22:49:55,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=822066.6666666666, ans=0.125 2023-12-22 22:50:00,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=822133.3333333334, ans=0.125 2023-12-22 22:50:01,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=822133.3333333334, ans=0.025 2023-12-22 22:50:03,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822133.3333333334, ans=0.1 2023-12-22 22:50:16,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=822200.0, ans=0.125 2023-12-22 22:50:17,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=822200.0, ans=0.125 2023-12-22 22:50:23,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=12.0 2023-12-22 22:50:32,181 INFO [train.py:886] (3/4) Epoch 26, batch 4200, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4947939.98 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:50:36,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=822333.3333333334, ans=0.0 2023-12-22 22:50:37,014 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+01 3.069e+01 3.218e+01 3.405e+01 4.073e+01, threshold=6.437e+01, percent-clipped=0.0 2023-12-22 22:51:23,443 INFO [train.py:886] (3/4) Epoch 26, batch 4250, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4948070.83 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:51:31,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=822666.6666666666, ans=0.0 2023-12-22 22:51:36,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=822733.3333333334, ans=0.95 2023-12-22 22:51:37,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=822733.3333333334, ans=0.0 2023-12-22 22:51:40,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-22 22:51:43,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=822800.0, ans=0.125 2023-12-22 22:51:43,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=822800.0, ans=0.125 2023-12-22 22:51:51,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=822800.0, ans=0.125 2023-12-22 22:51:54,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=822866.6666666666, ans=0.125 2023-12-22 22:51:59,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-12-22 22:52:11,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822933.3333333334, ans=0.125 2023-12-22 22:52:14,955 INFO [train.py:886] (3/4) Epoch 26, batch 4300, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4954827.45 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:52:17,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=823000.0, ans=0.0 2023-12-22 22:52:19,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-12-22 22:52:19,681 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.718e+01 3.099e+01 3.212e+01 3.358e+01 3.889e+01, threshold=6.424e+01, percent-clipped=0.0 2023-12-22 22:52:30,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=823066.6666666666, ans=0.125 2023-12-22 22:52:30,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=823066.6666666666, ans=0.125 2023-12-22 22:52:50,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=823200.0, ans=0.125 2023-12-22 22:53:02,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=823266.6666666666, ans=0.2 2023-12-22 22:53:03,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=823266.6666666666, ans=0.125 2023-12-22 22:53:06,682 INFO [train.py:886] (3/4) Epoch 26, batch 4350, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4955582.64 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 64.0 2023-12-22 22:53:09,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823333.3333333334, ans=0.125 2023-12-22 22:53:11,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=823333.3333333334, ans=0.035 2023-12-22 22:53:11,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=823333.3333333334, ans=0.0 2023-12-22 22:53:15,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=823333.3333333334, ans=0.125 2023-12-22 22:53:22,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=823400.0, ans=0.125 2023-12-22 22:53:24,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-12-22 22:53:30,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=823466.6666666666, ans=0.125 2023-12-22 22:53:45,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823533.3333333334, ans=0.125 2023-12-22 22:53:59,342 INFO [train.py:886] (3/4) Epoch 26, batch 4400, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4952631.22 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:54:02,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=823666.6666666666, ans=0.125 2023-12-22 22:54:04,077 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.224e+01 3.341e+01 3.511e+01 3.923e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-22 22:54:04,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823666.6666666666, ans=0.125 2023-12-22 22:54:05,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=823666.6666666666, ans=0.2 2023-12-22 22:54:10,596 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 22:54:34,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=823866.6666666666, ans=0.07 2023-12-22 22:54:37,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=823866.6666666666, ans=0.0 2023-12-22 22:54:42,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=823933.3333333334, ans=0.2 2023-12-22 22:54:43,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2023-12-22 22:54:49,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-12-22 22:54:50,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=824000.0, ans=0.1 2023-12-22 22:54:51,658 INFO [train.py:886] (3/4) Epoch 26, batch 4450, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4951933.82 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:55:36,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824266.6666666666, ans=0.1 2023-12-22 22:55:43,484 INFO [train.py:886] (3/4) Epoch 26, batch 4500, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4951594.58 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:55:48,318 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.106e+01 3.253e+01 3.392e+01 3.736e+01, threshold=6.505e+01, percent-clipped=0.0 2023-12-22 22:55:59,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=824400.0, ans=0.2 2023-12-22 22:56:06,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=824466.6666666666, ans=0.125 2023-12-22 22:56:20,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=824533.3333333334, ans=0.0 2023-12-22 22:56:35,056 INFO [train.py:886] (3/4) Epoch 26, batch 4550, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4952835.21 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:56:41,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=824666.6666666666, ans=0.125 2023-12-22 22:56:49,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=824733.3333333334, ans=0.02 2023-12-22 22:57:12,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=824866.6666666666, ans=0.2 2023-12-22 22:57:18,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=824933.3333333334, ans=0.0 2023-12-22 22:57:21,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=824933.3333333334, ans=0.0 2023-12-22 22:57:26,762 INFO [train.py:886] (3/4) Epoch 26, batch 4600, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4956117.34 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:57:32,095 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.119e+01 3.242e+01 3.395e+01 3.973e+01, threshold=6.484e+01, percent-clipped=0.0 2023-12-22 22:57:45,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=825066.6666666666, ans=15.0 2023-12-22 22:57:46,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-22 22:58:02,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=825200.0, ans=0.04949747468305833 2023-12-22 22:58:02,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2023-12-22 22:58:04,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=825200.0, ans=0.0 2023-12-22 22:58:14,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=825266.6666666666, ans=0.125 2023-12-22 22:58:19,439 INFO [train.py:886] (3/4) Epoch 26, batch 4650, loss[loss=0.0122, audio_tagging_loss=0.0122, over 22428.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4954724.26 frames. ], batch size: 107, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:58:26,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=825333.3333333334, ans=22.5 2023-12-22 22:59:05,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825600.0, ans=0.1 2023-12-22 22:59:08,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=825600.0, ans=0.0 2023-12-22 22:59:09,843 INFO [train.py:886] (3/4) Epoch 26, batch 4700, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4956261.50 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 22:59:10,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825666.6666666666, ans=0.1 2023-12-22 22:59:12,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2023-12-22 22:59:14,957 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.179e+01 3.317e+01 3.439e+01 3.833e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 22:59:18,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-12-22 22:59:21,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=825733.3333333334, ans=0.0 2023-12-22 22:59:40,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=825866.6666666666, ans=0.0 2023-12-22 22:59:57,201 INFO [train.py:886] (3/4) Epoch 26, batch 4750, loss[loss=0.01724, audio_tagging_loss=0.01724, over 24750.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4955240.30 frames. ], batch size: 99, lr: 4.12e-03, grad_scale: 64.0 2023-12-22 23:00:32,997 INFO [train.py:886] (3/4) Epoch 27, batch 0, loss[loss=0.03314, audio_tagging_loss=0.03314, over 21203.00 frames. ], tot_loss[loss=0.03314, audio_tagging_loss=0.03314, over 21203.00 frames. ], batch size: 107, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:00:32,997 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 23:00:46,081 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3344, 4.5692, 5.2128, 4.7387], device='cuda:3') 2023-12-22 23:00:51,939 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5667, 4.0546, 4.0895, 3.5230], device='cuda:3') 2023-12-22 23:00:53,966 INFO [train.py:917] (3/4) Epoch 27, validation: loss=0.03314, audio_tagging_loss=0.03314, over 3737520.00 frames. 2023-12-22 23:00:53,967 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 23:00:55,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826106.6666666666, ans=0.125 2023-12-22 23:00:57,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2023-12-22 23:01:04,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-12-22 23:01:11,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-12-22 23:01:21,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=826240.0, ans=0.04949747468305833 2023-12-22 23:01:21,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=22.5 2023-12-22 23:01:22,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=826240.0, ans=0.125 2023-12-22 23:01:25,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:25,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=826306.6666666666, ans=0.2 2023-12-22 23:01:31,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=826306.6666666666, ans=0.0 2023-12-22 23:01:35,333 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.888e+01 3.296e+01 3.604e+01 4.648e+01 9.057e+01, threshold=7.208e+01, percent-clipped=9.0 2023-12-22 23:01:44,718 INFO [train.py:886] (3/4) Epoch 27, batch 50, loss[loss=0.01603, audio_tagging_loss=0.01603, over 25000.00 frames. ], tot_loss[loss=0.02044, audio_tagging_loss=0.02044, over 1111642.78 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:01:45,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=826440.0, ans=0.125 2023-12-22 23:01:51,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=826440.0, ans=0.07 2023-12-22 23:02:01,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=826506.6666666666, ans=0.2 2023-12-22 23:02:01,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826506.6666666666, ans=0.1 2023-12-22 23:02:13,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-12-22 23:02:24,697 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:02:26,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-12-22 23:02:39,487 INFO [train.py:886] (3/4) Epoch 27, batch 100, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01753, audio_tagging_loss=0.01753, over 1967778.91 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:02:42,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=826773.3333333334, ans=0.0 2023-12-22 23:02:53,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=826840.0, ans=0.0 2023-12-22 23:03:05,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=826906.6666666666, ans=0.0 2023-12-22 23:03:16,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826973.3333333334, ans=0.125 2023-12-22 23:03:19,783 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.375e+01 3.585e+01 3.841e+01 4.539e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-22 23:03:27,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=827040.0, ans=0.04949747468305833 2023-12-22 23:03:29,227 INFO [train.py:886] (3/4) Epoch 27, batch 150, loss[loss=0.0156, audio_tagging_loss=0.0156, over 25000.00 frames. ], tot_loss[loss=0.01612, audio_tagging_loss=0.01612, over 2635107.18 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:03:32,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-12-22 23:03:35,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=827106.6666666666, ans=0.125 2023-12-22 23:03:54,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=827240.0, ans=0.07 2023-12-22 23:03:59,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=827306.6666666666, ans=0.125 2023-12-22 23:04:00,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-12-22 23:04:05,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=827306.6666666666, ans=0.2 2023-12-22 23:04:09,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=827373.3333333334, ans=0.125 2023-12-22 23:04:20,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=827440.0, ans=0.2 2023-12-22 23:04:21,166 INFO [train.py:886] (3/4) Epoch 27, batch 200, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01522, audio_tagging_loss=0.01522, over 3146773.15 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:04:29,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=15.0 2023-12-22 23:05:02,028 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.150e+01 3.268e+01 3.453e+01 3.797e+01, threshold=6.536e+01, percent-clipped=0.0 2023-12-22 23:05:12,174 INFO [train.py:886] (3/4) Epoch 27, batch 250, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01465, audio_tagging_loss=0.01465, over 3548639.58 frames. ], batch size: 100, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:05:22,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=827773.3333333334, ans=0.0 2023-12-22 23:05:23,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2023-12-22 23:05:26,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827840.0, ans=0.1 2023-12-22 23:05:38,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-22 23:05:40,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=827906.6666666666, ans=0.0 2023-12-22 23:05:43,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=827973.3333333334, ans=0.04949747468305833 2023-12-22 23:05:51,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=827973.3333333334, ans=0.0 2023-12-22 23:05:55,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=828040.0, ans=0.125 2023-12-22 23:05:57,331 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:06:04,752 INFO [train.py:886] (3/4) Epoch 27, batch 300, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 3856836.85 frames. ], batch size: 99, lr: 4.04e-03, grad_scale: 32.0 2023-12-22 23:06:11,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-12-22 23:06:19,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-12-22 23:06:32,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=828240.0, ans=0.0 2023-12-22 23:06:43,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=828306.6666666666, ans=0.125 2023-12-22 23:06:45,460 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.889e+01 3.171e+01 3.297e+01 3.491e+01 3.918e+01, threshold=6.593e+01, percent-clipped=0.0 2023-12-22 23:06:57,083 INFO [train.py:886] (3/4) Epoch 27, batch 350, loss[loss=0.01672, audio_tagging_loss=0.01672, over 24750.00 frames. ], tot_loss[loss=0.01407, audio_tagging_loss=0.01407, over 4091783.86 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:03,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2023-12-22 23:07:06,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2023-12-22 23:07:10,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=828506.6666666666, ans=0.125 2023-12-22 23:07:15,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=828573.3333333334, ans=0.0 2023-12-22 23:07:15,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=828573.3333333334, ans=0.0 2023-12-22 23:07:16,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=828573.3333333334, ans=0.125 2023-12-22 23:07:23,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=828573.3333333334, ans=0.05 2023-12-22 23:07:23,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-12-22 23:07:27,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=828640.0, ans=0.125 2023-12-22 23:07:29,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-12-22 23:07:40,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=828706.6666666666, ans=10.0 2023-12-22 23:07:43,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=828706.6666666666, ans=0.1 2023-12-22 23:07:47,930 INFO [train.py:886] (3/4) Epoch 27, batch 400, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01375, audio_tagging_loss=0.01375, over 4280433.96 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:07:48,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=828773.3333333334, ans=0.5 2023-12-22 23:07:56,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=828773.3333333334, ans=0.2 2023-12-22 23:08:06,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.09 vs. limit=15.0 2023-12-22 23:08:30,058 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.745e+01 3.115e+01 3.244e+01 3.373e+01 3.819e+01, threshold=6.489e+01, percent-clipped=0.0 2023-12-22 23:08:40,200 INFO [train.py:886] (3/4) Epoch 27, batch 450, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01338, audio_tagging_loss=0.01338, over 4430955.08 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:08:42,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=829106.6666666666, ans=0.1 2023-12-22 23:08:43,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829106.6666666666, ans=0.1 2023-12-22 23:08:47,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=829106.6666666666, ans=0.0 2023-12-22 23:08:59,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=829240.0, ans=0.125 2023-12-22 23:09:28,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=829373.3333333334, ans=0.2 2023-12-22 23:09:31,743 INFO [train.py:886] (3/4) Epoch 27, batch 500, loss[loss=0.01049, audio_tagging_loss=0.01049, over 25000.00 frames. ], tot_loss[loss=0.01325, audio_tagging_loss=0.01325, over 4548058.57 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:09:35,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=829440.0, ans=0.0 2023-12-22 23:09:53,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829573.3333333334, ans=0.125 2023-12-22 23:09:58,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-12-22 23:09:59,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=829573.3333333334, ans=0.05 2023-12-22 23:10:01,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=829640.0, ans=0.125 2023-12-22 23:10:06,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=829640.0, ans=0.125 2023-12-22 23:10:13,348 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.842e+01 3.068e+01 3.193e+01 3.340e+01 3.913e+01, threshold=6.386e+01, percent-clipped=0.0 2023-12-22 23:10:23,518 INFO [train.py:886] (3/4) Epoch 27, batch 550, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4637612.03 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:10:36,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=829840.0, ans=0.125 2023-12-22 23:10:59,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2023-12-22 23:11:09,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2023-12-22 23:11:10,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=830040.0, ans=0.09899494936611666 2023-12-22 23:11:12,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-22 23:11:15,952 INFO [train.py:886] (3/4) Epoch 27, batch 600, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24949.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 4706914.86 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:11:19,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.53 vs. limit=5.0 2023-12-22 23:11:37,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=830240.0, ans=0.1 2023-12-22 23:11:57,271 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+01 3.188e+01 3.320e+01 3.491e+01 4.138e+01, threshold=6.639e+01, percent-clipped=0.0 2023-12-22 23:12:07,457 INFO [train.py:886] (3/4) Epoch 27, batch 650, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01326, audio_tagging_loss=0.01326, over 4750358.47 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:12:12,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.40 vs. limit=15.0 2023-12-22 23:13:00,468 INFO [train.py:886] (3/4) Epoch 27, batch 700, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 4786884.83 frames. ], batch size: 99, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:13:16,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=830840.0, ans=10.0 2023-12-22 23:13:25,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2023-12-22 23:13:38,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=830973.3333333334, ans=0.2 2023-12-22 23:13:41,260 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.146e+01 3.254e+01 3.417e+01 3.974e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-22 23:13:44,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=831040.0, ans=0.125 2023-12-22 23:13:52,987 INFO [train.py:886] (3/4) Epoch 27, batch 750, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4823363.96 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:14:10,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2023-12-22 23:14:20,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=831240.0, ans=0.0 2023-12-22 23:14:22,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=831240.0, ans=0.0 2023-12-22 23:14:44,381 INFO [train.py:886] (3/4) Epoch 27, batch 800, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4856008.16 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:14:51,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=831440.0, ans=10.0 2023-12-22 23:15:06,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831573.3333333334, ans=0.125 2023-12-22 23:15:09,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=831573.3333333334, ans=0.125 2023-12-22 23:15:15,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=831640.0, ans=0.125 2023-12-22 23:15:26,458 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.771e+01 3.100e+01 3.260e+01 3.401e+01 4.213e+01, threshold=6.521e+01, percent-clipped=0.0 2023-12-22 23:15:35,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=831773.3333333334, ans=0.2 2023-12-22 23:15:35,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=831773.3333333334, ans=0.05 2023-12-22 23:15:36,713 INFO [train.py:886] (3/4) Epoch 27, batch 850, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4875965.79 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:15:45,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=831840.0, ans=0.125 2023-12-22 23:15:56,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=831906.6666666666, ans=0.0 2023-12-22 23:16:02,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=831906.6666666666, ans=0.025 2023-12-22 23:16:28,940 INFO [train.py:886] (3/4) Epoch 27, batch 900, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4890997.66 frames. ], batch size: 100, lr: 4.03e-03, grad_scale: 32.0 2023-12-22 23:16:32,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-12-22 23:16:34,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-22 23:16:35,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=832106.6666666666, ans=0.125 2023-12-22 23:16:46,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=832173.3333333334, ans=0.2 2023-12-22 23:16:54,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832240.0, ans=0.1 2023-12-22 23:16:54,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=832240.0, ans=0.0 2023-12-22 23:16:57,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=832240.0, ans=0.125 2023-12-22 23:17:11,216 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.850e+01 3.162e+01 3.282e+01 3.411e+01 3.879e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:17:19,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-12-22 23:17:20,754 INFO [train.py:886] (3/4) Epoch 27, batch 950, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4895721.60 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:17:35,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.25 vs. limit=15.0 2023-12-22 23:17:43,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832573.3333333334, ans=0.1 2023-12-22 23:18:03,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832706.6666666666, ans=0.1 2023-12-22 23:18:13,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-12-22 23:18:13,709 INFO [train.py:886] (3/4) Epoch 27, batch 1000, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4904136.27 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:18:14,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=832773.3333333334, ans=0.0 2023-12-22 23:18:16,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=832773.3333333334, ans=0.05 2023-12-22 23:18:19,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.25 vs. limit=22.5 2023-12-22 23:18:39,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=832906.6666666666, ans=0.04949747468305833 2023-12-22 23:18:45,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=832973.3333333334, ans=0.0 2023-12-22 23:18:46,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=832973.3333333334, ans=0.0 2023-12-22 23:18:52,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=832973.3333333334, ans=0.125 2023-12-22 23:18:53,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-22 23:18:54,481 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.109e+01 3.263e+01 3.461e+01 3.831e+01, threshold=6.526e+01, percent-clipped=0.0 2023-12-22 23:18:58,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=833040.0, ans=0.2 2023-12-22 23:18:59,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.80 vs. limit=22.5 2023-12-22 23:19:04,677 INFO [train.py:886] (3/4) Epoch 27, batch 1050, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4914009.65 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:19:18,452 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:19:31,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=833240.0, ans=0.125 2023-12-22 23:19:43,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=833306.6666666666, ans=0.0 2023-12-22 23:19:57,231 INFO [train.py:886] (3/4) Epoch 27, batch 1100, loss[loss=0.01433, audio_tagging_loss=0.01433, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4922701.24 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:19:58,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=833440.0, ans=0.125 2023-12-22 23:19:59,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=833440.0, ans=0.5 2023-12-22 23:20:05,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833440.0, ans=0.125 2023-12-22 23:20:19,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2023-12-22 23:20:36,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-22 23:20:37,343 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.138e+01 3.223e+01 3.425e+01 4.180e+01, threshold=6.446e+01, percent-clipped=0.0 2023-12-22 23:20:49,034 INFO [train.py:886] (3/4) Epoch 27, batch 1150, loss[loss=0.01059, audio_tagging_loss=0.01059, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4924831.39 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:01,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=833840.0, ans=0.125 2023-12-22 23:21:09,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-12-22 23:21:29,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=12.0 2023-12-22 23:21:30,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=834040.0, ans=0.125 2023-12-22 23:21:40,007 INFO [train.py:886] (3/4) Epoch 27, batch 1200, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4935743.03 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:21:45,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=834106.6666666666, ans=0.125 2023-12-22 23:21:46,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2023-12-22 23:22:00,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.56 vs. limit=22.5 2023-12-22 23:22:07,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=834240.0, ans=0.125 2023-12-22 23:22:12,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=834306.6666666666, ans=0.0 2023-12-22 23:22:20,813 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.144e+01 3.270e+01 3.430e+01 4.377e+01, threshold=6.540e+01, percent-clipped=0.0 2023-12-22 23:22:26,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=834373.3333333334, ans=0.125 2023-12-22 23:22:29,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=834373.3333333334, ans=0.125 2023-12-22 23:22:32,518 INFO [train.py:886] (3/4) Epoch 27, batch 1250, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4936636.06 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:22:46,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=834506.6666666666, ans=0.0 2023-12-22 23:22:51,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=834573.3333333334, ans=0.125 2023-12-22 23:22:58,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=834573.3333333334, ans=0.0 2023-12-22 23:23:03,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=834640.0, ans=0.0 2023-12-22 23:23:03,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=834640.0, ans=0.1 2023-12-22 23:23:23,904 INFO [train.py:886] (3/4) Epoch 27, batch 1300, loss[loss=0.01521, audio_tagging_loss=0.01521, over 24750.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4936047.67 frames. ], batch size: 99, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:23:27,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=834773.3333333334, ans=0.0 2023-12-22 23:23:39,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=834840.0, ans=0.125 2023-12-22 23:24:05,920 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.886e+01 3.125e+01 3.290e+01 3.447e+01 4.331e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:24:07,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=835040.0, ans=0.0 2023-12-22 23:24:08,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=835040.0, ans=0.125 2023-12-22 23:24:09,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=835040.0, ans=0.2 2023-12-22 23:24:11,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2023-12-22 23:24:12,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=835040.0, ans=0.125 2023-12-22 23:24:15,428 INFO [train.py:886] (3/4) Epoch 27, batch 1350, loss[loss=0.009291, audio_tagging_loss=0.009291, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4937223.75 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:24:32,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=835173.3333333334, ans=0.0 2023-12-22 23:24:40,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=835240.0, ans=0.1 2023-12-22 23:25:03,378 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:25:04,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=835373.3333333334, ans=0.0 2023-12-22 23:25:07,517 INFO [train.py:886] (3/4) Epoch 27, batch 1400, loss[loss=0.01078, audio_tagging_loss=0.01078, over 21453.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4938955.53 frames. ], batch size: 107, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:25:21,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-12-22 23:25:34,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.12 vs. limit=22.5 2023-12-22 23:25:43,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=835640.0, ans=0.0 2023-12-22 23:25:46,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=835640.0, ans=0.0 2023-12-22 23:25:48,891 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.757e+01 3.050e+01 3.190e+01 3.364e+01 4.111e+01, threshold=6.380e+01, percent-clipped=0.0 2023-12-22 23:25:55,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=835706.6666666666, ans=0.2 2023-12-22 23:25:58,445 INFO [train.py:886] (3/4) Epoch 27, batch 1450, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4950533.72 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:26:18,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=835840.0, ans=0.2 2023-12-22 23:26:38,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-12-22 23:26:40,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-12-22 23:26:51,924 INFO [train.py:886] (3/4) Epoch 27, batch 1500, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4947956.99 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:26:55,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=836106.6666666666, ans=0.125 2023-12-22 23:27:22,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-12-22 23:27:25,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=836306.6666666666, ans=0.0 2023-12-22 23:27:32,532 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.805e+01 3.110e+01 3.282e+01 3.449e+01 4.134e+01, threshold=6.564e+01, percent-clipped=0.0 2023-12-22 23:27:42,848 INFO [train.py:886] (3/4) Epoch 27, batch 1550, loss[loss=0.01677, audio_tagging_loss=0.01677, over 24937.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4949217.52 frames. ], batch size: 100, lr: 4.02e-03, grad_scale: 32.0 2023-12-22 23:28:01,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=836506.6666666666, ans=0.2 2023-12-22 23:28:10,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=836573.3333333334, ans=0.0 2023-12-22 23:28:26,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=836706.6666666666, ans=0.125 2023-12-22 23:28:31,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-12-22 23:28:31,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=836706.6666666666, ans=0.0 2023-12-22 23:28:32,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=836706.6666666666, ans=0.0 2023-12-22 23:28:32,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=836706.6666666666, ans=0.07 2023-12-22 23:28:35,347 INFO [train.py:886] (3/4) Epoch 27, batch 1600, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4936610.50 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:28:42,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=836773.3333333334, ans=0.2 2023-12-22 23:28:50,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=836840.0, ans=0.0 2023-12-22 23:29:16,726 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.807e+01 3.145e+01 3.260e+01 3.442e+01 4.134e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-22 23:29:26,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=837106.6666666666, ans=0.0 2023-12-22 23:29:27,605 INFO [train.py:886] (3/4) Epoch 27, batch 1650, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4933360.86 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:29:36,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=837173.3333333334, ans=0.125 2023-12-22 23:29:36,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=837173.3333333334, ans=0.0 2023-12-22 23:29:41,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=837173.3333333334, ans=0.125 2023-12-22 23:30:06,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=12.0 2023-12-22 23:30:19,111 INFO [train.py:886] (3/4) Epoch 27, batch 1700, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4942264.25 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:30:44,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=837573.3333333334, ans=0.125 2023-12-22 23:30:48,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=837573.3333333334, ans=0.2 2023-12-22 23:30:50,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=837640.0, ans=0.2 2023-12-22 23:31:01,544 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.145e+01 3.289e+01 3.408e+01 4.323e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-22 23:31:04,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=837706.6666666666, ans=0.125 2023-12-22 23:31:11,067 INFO [train.py:886] (3/4) Epoch 27, batch 1750, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4949035.40 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:31:22,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=837840.0, ans=0.125 2023-12-22 23:31:41,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-22 23:31:45,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=837973.3333333334, ans=0.04949747468305833 2023-12-22 23:31:49,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=837973.3333333334, ans=0.125 2023-12-22 23:31:54,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=838040.0, ans=0.2 2023-12-22 23:31:54,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=838040.0, ans=0.125 2023-12-22 23:32:03,064 INFO [train.py:886] (3/4) Epoch 27, batch 1800, loss[loss=0.01352, audio_tagging_loss=0.01352, over 20930.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4943968.60 frames. ], batch size: 107, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:32:29,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=838240.0, ans=0.0 2023-12-22 23:32:35,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838306.6666666666, ans=0.1 2023-12-22 23:32:43,911 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.173e+01 3.284e+01 3.409e+01 4.176e+01, threshold=6.568e+01, percent-clipped=0.0 2023-12-22 23:32:49,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-12-22 23:32:53,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-22 23:32:54,021 INFO [train.py:886] (3/4) Epoch 27, batch 1850, loss[loss=0.01544, audio_tagging_loss=0.01544, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4943448.48 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:32:54,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=838440.0, ans=0.0 2023-12-22 23:32:57,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=838440.0, ans=0.0 2023-12-22 23:33:01,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-12-22 23:33:20,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838573.3333333334, ans=0.0 2023-12-22 23:33:26,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=838640.0, ans=0.125 2023-12-22 23:33:27,427 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:33:32,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2023-12-22 23:33:37,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-12-22 23:33:41,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=838706.6666666666, ans=10.0 2023-12-22 23:33:45,749 INFO [train.py:886] (3/4) Epoch 27, batch 1900, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4946702.78 frames. ], batch size: 99, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:33:45,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=838773.3333333334, ans=0.125 2023-12-22 23:33:59,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838840.0, ans=0.125 2023-12-22 23:34:14,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=838906.6666666666, ans=0.2 2023-12-22 23:34:25,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.935e+01 3.098e+01 3.249e+01 3.476e+01 4.111e+01, threshold=6.498e+01, percent-clipped=0.0 2023-12-22 23:34:27,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=839040.0, ans=0.125 2023-12-22 23:34:36,056 INFO [train.py:886] (3/4) Epoch 27, batch 1950, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4945019.40 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 32.0 2023-12-22 23:34:38,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=839106.6666666666, ans=0.125 2023-12-22 23:34:50,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2023-12-22 23:34:53,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=839173.3333333334, ans=0.125 2023-12-22 23:34:58,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=839240.0, ans=0.0 2023-12-22 23:35:00,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-12-22 23:35:08,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=839306.6666666666, ans=0.04949747468305833 2023-12-22 23:35:11,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2023-12-22 23:35:20,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-22 23:35:25,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2023-12-22 23:35:28,642 INFO [train.py:886] (3/4) Epoch 27, batch 2000, loss[loss=0.0127, audio_tagging_loss=0.0127, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4941293.39 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:35:28,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=839440.0, ans=0.125 2023-12-22 23:35:32,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839440.0, ans=0.1 2023-12-22 23:35:35,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=839440.0, ans=0.0 2023-12-22 23:35:50,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-12-22 23:35:57,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=839573.3333333334, ans=0.0 2023-12-22 23:35:57,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=839573.3333333334, ans=0.125 2023-12-22 23:36:03,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-12-22 23:36:04,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=839640.0, ans=0.2 2023-12-22 23:36:06,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=839640.0, ans=0.0 2023-12-22 23:36:09,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2023-12-22 23:36:09,863 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+01 3.085e+01 3.253e+01 3.416e+01 3.912e+01, threshold=6.506e+01, percent-clipped=0.0 2023-12-22 23:36:21,423 INFO [train.py:886] (3/4) Epoch 27, batch 2050, loss[loss=0.01489, audio_tagging_loss=0.01489, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4945292.05 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:36:39,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=839906.6666666666, ans=0.125 2023-12-22 23:36:54,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-12-22 23:36:57,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=839973.3333333334, ans=0.025 2023-12-22 23:37:06,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2023-12-22 23:37:08,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=840040.0, ans=0.125 2023-12-22 23:37:11,549 INFO [train.py:886] (3/4) Epoch 27, batch 2100, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4949593.58 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:37:15,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=840106.6666666666, ans=0.125 2023-12-22 23:37:18,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840106.6666666666, ans=0.1 2023-12-22 23:37:53,521 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.144e+01 3.259e+01 3.394e+01 3.827e+01, threshold=6.517e+01, percent-clipped=0.0 2023-12-22 23:38:03,697 INFO [train.py:886] (3/4) Epoch 27, batch 2150, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24014.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4953629.62 frames. ], batch size: 100, lr: 4.01e-03, grad_scale: 64.0 2023-12-22 23:38:07,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=840440.0, ans=0.125 2023-12-22 23:38:10,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840440.0, ans=0.125 2023-12-22 23:38:14,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=840506.6666666666, ans=0.0 2023-12-22 23:38:20,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=840506.6666666666, ans=0.0 2023-12-22 23:38:22,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=840573.3333333334, ans=0.0 2023-12-22 23:38:53,877 INFO [train.py:886] (3/4) Epoch 27, batch 2200, loss[loss=0.01354, audio_tagging_loss=0.01354, over 24750.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 4949911.71 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:39:02,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840773.3333333334, ans=0.1 2023-12-22 23:39:07,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.75 vs. limit=22.5 2023-12-22 23:39:21,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=840906.6666666666, ans=0.125 2023-12-22 23:39:35,835 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.899e+01 3.169e+01 3.290e+01 3.439e+01 3.968e+01, threshold=6.580e+01, percent-clipped=0.0 2023-12-22 23:39:45,340 INFO [train.py:886] (3/4) Epoch 27, batch 2250, loss[loss=0.009971, audio_tagging_loss=0.009971, over 25000.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4951150.27 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:39:56,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-12-22 23:40:04,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=841173.3333333334, ans=0.05 2023-12-22 23:40:16,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=841306.6666666666, ans=0.125 2023-12-22 23:40:37,973 INFO [train.py:886] (3/4) Epoch 27, batch 2300, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4952277.50 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:40:49,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=12.0 2023-12-22 23:40:51,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-22 23:40:51,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-12-22 23:40:57,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=841573.3333333334, ans=0.125 2023-12-22 23:41:03,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=841573.3333333334, ans=0.07 2023-12-22 23:41:04,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2023-12-22 23:41:10,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=841640.0, ans=0.125 2023-12-22 23:41:18,131 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.111e+01 3.239e+01 3.412e+01 3.909e+01, threshold=6.478e+01, percent-clipped=0.0 2023-12-22 23:41:22,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841706.6666666666, ans=0.1 2023-12-22 23:41:26,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=841706.6666666666, ans=0.0 2023-12-22 23:41:27,654 INFO [train.py:886] (3/4) Epoch 27, batch 2350, loss[loss=0.009975, audio_tagging_loss=0.009975, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4954064.02 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:41:48,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=841906.6666666666, ans=0.125 2023-12-22 23:41:48,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=841906.6666666666, ans=0.125 2023-12-22 23:41:57,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=841973.3333333334, ans=0.125 2023-12-22 23:42:01,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=841973.3333333334, ans=0.125 2023-12-22 23:42:07,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=841973.3333333334, ans=0.125 2023-12-22 23:42:11,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=12.0 2023-12-22 23:42:20,079 INFO [train.py:886] (3/4) Epoch 27, batch 2400, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4957222.99 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:42:28,687 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:42:29,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=842173.3333333334, ans=0.125 2023-12-22 23:42:32,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842173.3333333334, ans=0.1 2023-12-22 23:42:46,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-12-22 23:43:00,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.113e+01 3.225e+01 3.408e+01 4.094e+01, threshold=6.450e+01, percent-clipped=0.0 2023-12-22 23:43:02,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842373.3333333334, ans=0.1 2023-12-22 23:43:10,990 INFO [train.py:886] (3/4) Epoch 27, batch 2450, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4961992.68 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:43:14,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=842440.0, ans=0.0 2023-12-22 23:43:17,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=842440.0, ans=0.2 2023-12-22 23:43:40,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=842573.3333333334, ans=0.0 2023-12-22 23:43:50,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=842640.0, ans=0.125 2023-12-22 23:44:01,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=842706.6666666666, ans=0.2 2023-12-22 23:44:03,521 INFO [train.py:886] (3/4) Epoch 27, batch 2500, loss[loss=0.01236, audio_tagging_loss=0.01236, over 21706.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4952549.30 frames. ], batch size: 107, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:44:12,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842773.3333333334, ans=0.1 2023-12-22 23:44:12,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=842840.0, ans=0.125 2023-12-22 23:44:18,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-22 23:44:44,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.187e+01 3.317e+01 3.432e+01 3.909e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-22 23:44:49,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2023-12-22 23:44:55,752 INFO [train.py:886] (3/4) Epoch 27, batch 2550, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4950303.04 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:44:56,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=843106.6666666666, ans=0.125 2023-12-22 23:45:10,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.18 vs. limit=12.0 2023-12-22 23:45:15,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=843240.0, ans=0.125 2023-12-22 23:45:19,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.68 vs. limit=10.0 2023-12-22 23:45:34,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=843306.6666666666, ans=0.125 2023-12-22 23:45:42,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=843373.3333333334, ans=0.0 2023-12-22 23:45:46,027 INFO [train.py:886] (3/4) Epoch 27, batch 2600, loss[loss=0.01017, audio_tagging_loss=0.01017, over 24750.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4949907.30 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:45:50,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=843440.0, ans=0.0 2023-12-22 23:46:01,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-12-22 23:46:05,498 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:46:22,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2023-12-22 23:46:27,694 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.099e+01 3.252e+01 3.438e+01 3.884e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-22 23:46:27,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843706.6666666666, ans=0.125 2023-12-22 23:46:30,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843706.6666666666, ans=0.1 2023-12-22 23:46:38,018 INFO [train.py:886] (3/4) Epoch 27, batch 2650, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4952627.34 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:46:38,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2023-12-22 23:46:40,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843773.3333333334, ans=0.125 2023-12-22 23:46:52,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843840.0, ans=0.125 2023-12-22 23:47:20,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844040.0, ans=0.125 2023-12-22 23:47:26,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844040.0, ans=0.125 2023-12-22 23:47:30,278 INFO [train.py:886] (3/4) Epoch 27, batch 2700, loss[loss=0.0116, audio_tagging_loss=0.0116, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4957987.87 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:47:33,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=844106.6666666666, ans=0.035 2023-12-22 23:47:36,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=844106.6666666666, ans=0.0 2023-12-22 23:47:43,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-12-22 23:47:44,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=844173.3333333334, ans=0.0 2023-12-22 23:47:51,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=844240.0, ans=0.025 2023-12-22 23:47:55,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=844240.0, ans=0.125 2023-12-22 23:48:03,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=844306.6666666666, ans=0.125 2023-12-22 23:48:03,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=844306.6666666666, ans=0.125 2023-12-22 23:48:10,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-12-22 23:48:11,088 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.112e+01 3.261e+01 3.405e+01 4.046e+01, threshold=6.522e+01, percent-clipped=0.0 2023-12-22 23:48:21,322 INFO [train.py:886] (3/4) Epoch 27, batch 2750, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4962143.69 frames. ], batch size: 100, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:48:44,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2023-12-22 23:48:51,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=844640.0, ans=0.125 2023-12-22 23:49:13,583 INFO [train.py:886] (3/4) Epoch 27, batch 2800, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4962012.41 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 64.0 2023-12-22 23:49:22,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=844773.3333333334, ans=0.125 2023-12-22 23:49:24,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-12-22 23:49:37,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2023-12-22 23:49:54,270 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.194e+01 3.305e+01 3.452e+01 3.886e+01, threshold=6.610e+01, percent-clipped=0.0 2023-12-22 23:49:56,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=845040.0, ans=0.125 2023-12-22 23:50:03,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=845106.6666666666, ans=0.125 2023-12-22 23:50:05,155 INFO [train.py:886] (3/4) Epoch 27, batch 2850, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4952575.92 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:15,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=845173.3333333334, ans=0.1 2023-12-22 23:50:25,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2023-12-22 23:50:29,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=845240.0, ans=0.2 2023-12-22 23:50:45,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2023-12-22 23:50:48,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=845373.3333333334, ans=0.1 2023-12-22 23:50:56,332 INFO [train.py:886] (3/4) Epoch 27, batch 2900, loss[loss=0.01321, audio_tagging_loss=0.01321, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4949979.32 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:50:57,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=845440.0, ans=0.0 2023-12-22 23:50:59,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=845440.0, ans=0.0 2023-12-22 23:51:12,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=845506.6666666666, ans=0.125 2023-12-22 23:51:31,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-12-22 23:51:37,798 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.117e+01 3.211e+01 3.387e+01 4.000e+01, threshold=6.423e+01, percent-clipped=0.0 2023-12-22 23:51:47,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=845706.6666666666, ans=0.2 2023-12-22 23:51:48,685 INFO [train.py:886] (3/4) Epoch 27, batch 2950, loss[loss=0.01274, audio_tagging_loss=0.01274, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4947790.49 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:52:20,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=845973.3333333334, ans=0.0 2023-12-22 23:52:23,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=845973.3333333334, ans=0.07 2023-12-22 23:52:26,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=845973.3333333334, ans=0.0 2023-12-22 23:52:39,820 INFO [train.py:886] (3/4) Epoch 27, batch 3000, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4953181.43 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:52:39,821 INFO [train.py:909] (3/4) Computing validation loss 2023-12-22 23:53:00,356 INFO [train.py:917] (3/4) Epoch 27, validation: loss=0.03311, audio_tagging_loss=0.03311, over 3737520.00 frames. 2023-12-22 23:53:00,357 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-22 23:53:04,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=846106.6666666666, ans=0.1 2023-12-22 23:53:22,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=846240.0, ans=0.0 2023-12-22 23:53:36,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=846306.6666666666, ans=0.0 2023-12-22 23:53:36,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=846306.6666666666, ans=0.125 2023-12-22 23:53:42,510 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.897e+01 3.118e+01 3.260e+01 3.397e+01 4.017e+01, threshold=6.519e+01, percent-clipped=0.0 2023-12-22 23:53:42,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2023-12-22 23:53:48,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-22 23:53:53,391 INFO [train.py:886] (3/4) Epoch 27, batch 3050, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4955676.05 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:54:16,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=846573.3333333334, ans=0.0 2023-12-22 23:54:37,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=846706.6666666666, ans=0.125 2023-12-22 23:54:37,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=846706.6666666666, ans=0.0 2023-12-22 23:54:44,003 INFO [train.py:886] (3/4) Epoch 27, batch 3100, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4955812.32 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:54:48,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=846773.3333333334, ans=0.125 2023-12-22 23:54:57,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=846840.0, ans=0.0 2023-12-22 23:54:57,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2023-12-22 23:55:12,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=846906.6666666666, ans=0.125 2023-12-22 23:55:25,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=847040.0, ans=0.0 2023-12-22 23:55:26,096 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.155e+01 3.323e+01 3.468e+01 4.091e+01, threshold=6.646e+01, percent-clipped=0.0 2023-12-22 23:55:28,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-12-22 23:55:35,696 INFO [train.py:886] (3/4) Epoch 27, batch 3150, loss[loss=0.01479, audio_tagging_loss=0.01479, over 24750.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4949313.79 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:55:36,769 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:55:38,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=847106.6666666666, ans=0.125 2023-12-22 23:55:41,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=847106.6666666666, ans=0.5 2023-12-22 23:55:45,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.38 vs. limit=22.5 2023-12-22 23:55:47,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847173.3333333334, ans=0.125 2023-12-22 23:55:49,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=847173.3333333334, ans=0.125 2023-12-22 23:55:58,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=847240.0, ans=0.125 2023-12-22 23:56:20,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-12-22 23:56:27,860 INFO [train.py:886] (3/4) Epoch 27, batch 3200, loss[loss=0.01162, audio_tagging_loss=0.01162, over 23954.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4944483.77 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:56:32,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=847440.0, ans=0.125 2023-12-22 23:56:34,582 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-22 23:56:34,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=847440.0, ans=0.0 2023-12-22 23:56:38,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847506.6666666666, ans=0.125 2023-12-22 23:56:44,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-12-22 23:56:46,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=847573.3333333334, ans=0.2 2023-12-22 23:56:50,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2023-12-22 23:57:02,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=847640.0, ans=10.0 2023-12-22 23:57:08,447 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.746e+01 3.162e+01 3.265e+01 3.446e+01 4.055e+01, threshold=6.529e+01, percent-clipped=0.0 2023-12-22 23:57:11,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=847706.6666666666, ans=0.1 2023-12-22 23:57:17,861 INFO [train.py:886] (3/4) Epoch 27, batch 3250, loss[loss=0.01125, audio_tagging_loss=0.01125, over 23967.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4949420.80 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:57:20,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=847773.3333333334, ans=0.0 2023-12-22 23:57:24,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=847773.3333333334, ans=0.0 2023-12-22 23:57:30,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-22 23:57:42,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.87 vs. limit=10.0 2023-12-22 23:58:02,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=848040.0, ans=0.0 2023-12-22 23:58:10,820 INFO [train.py:886] (3/4) Epoch 27, batch 3300, loss[loss=0.01253, audio_tagging_loss=0.01253, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4954880.87 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:58:26,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=848173.3333333334, ans=0.125 2023-12-22 23:58:33,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848240.0, ans=0.1 2023-12-22 23:58:38,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=848240.0, ans=0.125 2023-12-22 23:58:40,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=848240.0, ans=0.125 2023-12-22 23:58:51,325 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.800e+01 3.157e+01 3.288e+01 3.417e+01 4.683e+01, threshold=6.576e+01, percent-clipped=0.0 2023-12-22 23:59:02,298 INFO [train.py:886] (3/4) Epoch 27, batch 3350, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4958020.07 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:59:08,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=848440.0, ans=0.125 2023-12-22 23:59:27,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-12-22 23:59:32,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=848640.0, ans=0.0 2023-12-22 23:59:34,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-12-22 23:59:49,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=12.0 2023-12-22 23:59:52,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-12-22 23:59:53,236 INFO [train.py:886] (3/4) Epoch 27, batch 3400, loss[loss=0.01398, audio_tagging_loss=0.01398, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4959271.82 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 64.0 2023-12-22 23:59:54,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=848773.3333333334, ans=0.09899494936611666 2023-12-23 00:00:33,028 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.887e+01 3.185e+01 3.323e+01 3.461e+01 4.226e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 00:00:45,433 INFO [train.py:886] (3/4) Epoch 27, batch 3450, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4952564.52 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 64.0 2023-12-23 00:00:47,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=849106.6666666666, ans=0.95 2023-12-23 00:00:48,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849106.6666666666, ans=0.1 2023-12-23 00:01:32,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.30 vs. limit=22.5 2023-12-23 00:01:36,105 INFO [train.py:886] (3/4) Epoch 27, batch 3500, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4949005.33 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:01:42,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=849440.0, ans=0.125 2023-12-23 00:01:55,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=849506.6666666666, ans=0.125 2023-12-23 00:02:20,115 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+01 3.138e+01 3.283e+01 3.482e+01 4.072e+01, threshold=6.566e+01, percent-clipped=0.0 2023-12-23 00:02:25,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=849706.6666666666, ans=0.125 2023-12-23 00:02:29,537 INFO [train.py:886] (3/4) Epoch 27, batch 3550, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4948254.55 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:02:36,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=849773.3333333334, ans=0.125 2023-12-23 00:02:48,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=849840.0, ans=0.0 2023-12-23 00:02:48,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-23 00:03:09,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2023-12-23 00:03:16,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=850040.0, ans=0.125 2023-12-23 00:03:22,007 INFO [train.py:886] (3/4) Epoch 27, batch 3600, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4948220.16 frames. ], batch size: 99, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:03:27,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850106.6666666666, ans=0.1 2023-12-23 00:03:57,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850306.6666666666, ans=0.1 2023-12-23 00:04:03,113 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.137e+01 3.279e+01 3.466e+01 4.135e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:04:06,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850373.3333333334, ans=0.1 2023-12-23 00:04:12,587 INFO [train.py:886] (3/4) Epoch 27, batch 3650, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4956363.24 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:04:13,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=850440.0, ans=0.125 2023-12-23 00:04:14,650 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:04:23,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=850506.6666666666, ans=0.2 2023-12-23 00:04:28,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=850506.6666666666, ans=0.125 2023-12-23 00:04:36,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850573.3333333334, ans=0.1 2023-12-23 00:04:36,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850573.3333333334, ans=0.1 2023-12-23 00:04:39,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=850573.3333333334, ans=0.0 2023-12-23 00:04:48,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=850640.0, ans=0.125 2023-12-23 00:04:49,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=850640.0, ans=0.0 2023-12-23 00:04:50,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-12-23 00:04:51,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850640.0, ans=0.1 2023-12-23 00:05:05,193 INFO [train.py:886] (3/4) Epoch 27, batch 3700, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4960252.97 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:05:20,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=850840.0, ans=0.0 2023-12-23 00:05:24,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=850906.6666666666, ans=0.125 2023-12-23 00:05:43,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=850973.3333333334, ans=0.125 2023-12-23 00:05:46,645 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.190e+01 3.298e+01 3.442e+01 3.856e+01, threshold=6.597e+01, percent-clipped=0.0 2023-12-23 00:05:56,018 INFO [train.py:886] (3/4) Epoch 27, batch 3750, loss[loss=0.01246, audio_tagging_loss=0.01246, over 22371.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4950634.29 frames. ], batch size: 107, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:05:57,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=851106.6666666666, ans=0.025 2023-12-23 00:05:58,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2023-12-23 00:06:04,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-12-23 00:06:12,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851173.3333333334, ans=0.1 2023-12-23 00:06:20,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851240.0, ans=0.1 2023-12-23 00:06:25,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-23 00:06:38,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=851373.3333333334, ans=0.07 2023-12-23 00:06:48,650 INFO [train.py:886] (3/4) Epoch 27, batch 3800, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 4941224.58 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:06:50,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=851440.0, ans=0.0 2023-12-23 00:06:53,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-12-23 00:06:56,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=851440.0, ans=0.0 2023-12-23 00:07:18,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=851573.3333333334, ans=0.125 2023-12-23 00:07:19,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=851640.0, ans=0.125 2023-12-23 00:07:29,305 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.838e+01 3.145e+01 3.329e+01 3.462e+01 3.871e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 00:07:40,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851773.3333333334, ans=0.1 2023-12-23 00:07:40,921 INFO [train.py:886] (3/4) Epoch 27, batch 3850, loss[loss=0.01333, audio_tagging_loss=0.01333, over 23999.00 frames. ], tot_loss[loss=0.01301, audio_tagging_loss=0.01301, over 4936187.24 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:07:41,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-12-23 00:07:50,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=851840.0, ans=0.1 2023-12-23 00:07:55,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=851840.0, ans=0.125 2023-12-23 00:07:58,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=851840.0, ans=0.125 2023-12-23 00:08:07,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-12-23 00:08:18,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-23 00:08:31,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-12-23 00:08:32,571 INFO [train.py:886] (3/4) Epoch 27, batch 3900, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4935870.04 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:08:35,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=852106.6666666666, ans=0.125 2023-12-23 00:08:42,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=852106.6666666666, ans=0.2 2023-12-23 00:08:43,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=852173.3333333334, ans=0.125 2023-12-23 00:08:52,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=852240.0, ans=0.0 2023-12-23 00:08:55,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852240.0, ans=0.1 2023-12-23 00:09:02,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-12-23 00:09:04,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=852306.6666666666, ans=0.2 2023-12-23 00:09:09,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=852306.6666666666, ans=0.0 2023-12-23 00:09:14,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.97 vs. limit=22.5 2023-12-23 00:09:15,481 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.748e+01 3.126e+01 3.278e+01 3.398e+01 3.984e+01, threshold=6.555e+01, percent-clipped=0.0 2023-12-23 00:09:22,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852373.3333333334, ans=0.1 2023-12-23 00:09:22,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=852373.3333333334, ans=0.125 2023-12-23 00:09:22,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=852373.3333333334, ans=0.2 2023-12-23 00:09:25,090 INFO [train.py:886] (3/4) Epoch 27, batch 3950, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4941131.12 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:09:54,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=852573.3333333334, ans=0.125 2023-12-23 00:10:07,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-12-23 00:10:08,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=852706.6666666666, ans=0.125 2023-12-23 00:10:09,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=852706.6666666666, ans=10.0 2023-12-23 00:10:10,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=852706.6666666666, ans=0.0 2023-12-23 00:10:16,913 INFO [train.py:886] (3/4) Epoch 27, batch 4000, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4949894.18 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 128.0 2023-12-23 00:10:18,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=852773.3333333334, ans=0.125 2023-12-23 00:10:22,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=852773.3333333334, ans=0.125 2023-12-23 00:10:22,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=852773.3333333334, ans=0.0 2023-12-23 00:10:28,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=852840.0, ans=0.125 2023-12-23 00:10:35,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=852840.0, ans=0.0 2023-12-23 00:10:55,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-23 00:10:59,972 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.188e+01 3.348e+01 3.452e+01 3.928e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 00:11:07,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=853106.6666666666, ans=0.125 2023-12-23 00:11:08,495 INFO [train.py:886] (3/4) Epoch 27, batch 4050, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4949944.14 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 64.0 2023-12-23 00:11:14,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853106.6666666666, ans=0.1 2023-12-23 00:11:15,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853106.6666666666, ans=0.1 2023-12-23 00:11:20,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=853173.3333333334, ans=0.0 2023-12-23 00:11:25,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853173.3333333334, ans=0.1 2023-12-23 00:11:26,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2023-12-23 00:11:46,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=853306.6666666666, ans=15.0 2023-12-23 00:12:02,799 INFO [train.py:886] (3/4) Epoch 27, batch 4100, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4945362.31 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:12:06,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2023-12-23 00:12:08,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=853440.0, ans=0.2 2023-12-23 00:12:08,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-12-23 00:12:31,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853573.3333333334, ans=0.1 2023-12-23 00:12:36,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=853640.0, ans=0.05 2023-12-23 00:12:37,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=853640.0, ans=0.125 2023-12-23 00:12:44,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.808e+01 3.219e+01 3.295e+01 3.462e+01 3.940e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 00:12:47,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=853706.6666666666, ans=0.125 2023-12-23 00:12:50,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-12-23 00:12:54,567 INFO [train.py:886] (3/4) Epoch 27, batch 4150, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4942543.01 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:13:37,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854040.0, ans=0.1 2023-12-23 00:13:39,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2023-12-23 00:13:46,152 INFO [train.py:886] (3/4) Epoch 27, batch 4200, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4948101.82 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:14:03,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-12-23 00:14:12,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=854240.0, ans=0.125 2023-12-23 00:14:13,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=854240.0, ans=0.125 2023-12-23 00:14:13,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854240.0, ans=0.1 2023-12-23 00:14:26,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=854373.3333333334, ans=0.2 2023-12-23 00:14:28,651 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.773e+01 3.191e+01 3.309e+01 3.476e+01 4.229e+01, threshold=6.619e+01, percent-clipped=0.0 2023-12-23 00:14:38,657 INFO [train.py:886] (3/4) Epoch 27, batch 4250, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4951744.40 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:15:00,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=854573.3333333334, ans=0.125 2023-12-23 00:15:26,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854706.6666666666, ans=0.1 2023-12-23 00:15:27,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=854706.6666666666, ans=0.125 2023-12-23 00:15:29,741 INFO [train.py:886] (3/4) Epoch 27, batch 4300, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4950420.73 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:15:40,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=854840.0, ans=0.0 2023-12-23 00:16:00,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=854973.3333333334, ans=0.125 2023-12-23 00:16:00,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=854973.3333333334, ans=0.125 2023-12-23 00:16:13,510 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+01 3.183e+01 3.299e+01 3.496e+01 3.905e+01, threshold=6.598e+01, percent-clipped=0.0 2023-12-23 00:16:17,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=855040.0, ans=0.2 2023-12-23 00:16:21,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855040.0, ans=0.1 2023-12-23 00:16:22,699 INFO [train.py:886] (3/4) Epoch 27, batch 4350, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4950263.88 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:16:26,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=855106.6666666666, ans=0.0 2023-12-23 00:16:33,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2023-12-23 00:17:11,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=855373.3333333334, ans=0.125 2023-12-23 00:17:13,774 INFO [train.py:886] (3/4) Epoch 27, batch 4400, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4944334.39 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:17:13,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=855440.0, ans=0.05 2023-12-23 00:17:14,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=855440.0, ans=0.0 2023-12-23 00:17:18,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=855440.0, ans=0.125 2023-12-23 00:17:20,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855440.0, ans=0.125 2023-12-23 00:17:31,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=855506.6666666666, ans=0.125 2023-12-23 00:17:36,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=855573.3333333334, ans=0.125 2023-12-23 00:17:42,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=855573.3333333334, ans=0.125 2023-12-23 00:17:55,455 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.876e+01 3.175e+01 3.302e+01 3.505e+01 4.332e+01, threshold=6.604e+01, percent-clipped=0.0 2023-12-23 00:17:55,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=855706.6666666666, ans=0.1 2023-12-23 00:18:03,947 INFO [train.py:886] (3/4) Epoch 27, batch 4450, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01305, audio_tagging_loss=0.01305, over 4943082.46 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:18:04,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-12-23 00:18:11,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=855773.3333333334, ans=0.125 2023-12-23 00:18:37,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=855973.3333333334, ans=0.125 2023-12-23 00:18:49,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=856040.0, ans=0.125 2023-12-23 00:18:52,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=856040.0, ans=0.125 2023-12-23 00:18:55,917 INFO [train.py:886] (3/4) Epoch 27, batch 4500, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4940408.57 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:18:56,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=856106.6666666666, ans=0.2 2023-12-23 00:18:56,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-12-23 00:19:10,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=856173.3333333334, ans=0.0 2023-12-23 00:19:20,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=856240.0, ans=0.125 2023-12-23 00:19:28,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=856306.6666666666, ans=0.125 2023-12-23 00:19:31,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.04 vs. limit=10.0 2023-12-23 00:19:33,550 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:19:37,242 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.151e+01 3.331e+01 3.446e+01 3.976e+01, threshold=6.663e+01, percent-clipped=0.0 2023-12-23 00:19:38,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.12 vs. limit=15.0 2023-12-23 00:19:45,685 INFO [train.py:886] (3/4) Epoch 27, batch 4550, loss[loss=0.0145, audio_tagging_loss=0.0145, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4947416.94 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:19:49,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856440.0, ans=0.125 2023-12-23 00:20:38,727 INFO [train.py:886] (3/4) Epoch 27, batch 4600, loss[loss=0.01338, audio_tagging_loss=0.01338, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4946963.37 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:21:11,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=856973.3333333334, ans=0.025 2023-12-23 00:21:13,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=856973.3333333334, ans=0.125 2023-12-23 00:21:20,173 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.147e+01 3.251e+01 3.413e+01 3.679e+01, threshold=6.502e+01, percent-clipped=0.0 2023-12-23 00:21:30,190 INFO [train.py:886] (3/4) Epoch 27, batch 4650, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4946084.48 frames. ], batch size: 100, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:21:48,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857173.3333333334, ans=0.1 2023-12-23 00:22:20,062 INFO [train.py:886] (3/4) Epoch 27, batch 4700, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4945101.18 frames. ], batch size: 99, lr: 3.97e-03, grad_scale: 64.0 2023-12-23 00:22:40,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=857573.3333333334, ans=0.0 2023-12-23 00:22:43,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=857573.3333333334, ans=0.125 2023-12-23 00:22:55,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=857640.0, ans=0.125 2023-12-23 00:22:57,958 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.951e+01 3.169e+01 3.333e+01 3.511e+01 4.698e+01, threshold=6.666e+01, percent-clipped=0.0 2023-12-23 00:22:59,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=857706.6666666666, ans=0.2 2023-12-23 00:23:00,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=857706.6666666666, ans=0.125 2023-12-23 00:23:05,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=857706.6666666666, ans=0.125 2023-12-23 00:23:07,083 INFO [train.py:886] (3/4) Epoch 27, batch 4750, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24750.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4940074.82 frames. ], batch size: 99, lr: 3.96e-03, grad_scale: 64.0 2023-12-23 00:23:10,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=857773.3333333334, ans=0.0 2023-12-23 00:23:42,401 INFO [train.py:886] (3/4) Epoch 28, batch 0, loss[loss=0.02547, audio_tagging_loss=0.02547, over 24107.00 frames. ], tot_loss[loss=0.02547, audio_tagging_loss=0.02547, over 24107.00 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:23:42,401 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 00:24:03,706 INFO [train.py:917] (3/4) Epoch 28, validation: loss=0.03329, audio_tagging_loss=0.03329, over 3737520.00 frames. 2023-12-23 00:24:03,706 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 00:24:19,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=857946.6666666666, ans=0.125 2023-12-23 00:24:26,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=858013.3333333334, ans=0.125 2023-12-23 00:24:34,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858080.0, ans=0.125 2023-12-23 00:24:51,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2023-12-23 00:24:53,481 INFO [train.py:886] (3/4) Epoch 28, batch 50, loss[loss=0.01573, audio_tagging_loss=0.01573, over 24012.00 frames. ], tot_loss[loss=0.02004, audio_tagging_loss=0.02004, over 1111252.79 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:24:55,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-12-23 00:24:58,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858213.3333333334, ans=0.1 2023-12-23 00:25:04,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-12-23 00:25:15,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=858346.6666666666, ans=0.125 2023-12-23 00:25:20,166 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.879e+01 3.617e+01 4.003e+01 4.694e+01 1.109e+02, threshold=8.005e+01, percent-clipped=9.0 2023-12-23 00:25:26,119 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.028e-02 2023-12-23 00:25:29,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-23 00:25:32,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-12-23 00:25:42,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=858480.0, ans=0.125 2023-12-23 00:25:42,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=858480.0, ans=0.0 2023-12-23 00:25:44,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=858546.6666666666, ans=0.125 2023-12-23 00:25:44,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=858546.6666666666, ans=0.125 2023-12-23 00:25:45,094 INFO [train.py:886] (3/4) Epoch 28, batch 100, loss[loss=0.01777, audio_tagging_loss=0.01777, over 25000.00 frames. ], tot_loss[loss=0.0174, audio_tagging_loss=0.0174, over 1968617.78 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:26:09,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858680.0, ans=0.1 2023-12-23 00:26:19,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=858746.6666666666, ans=0.125 2023-12-23 00:26:24,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858746.6666666666, ans=0.125 2023-12-23 00:26:25,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=858813.3333333334, ans=0.125 2023-12-23 00:26:26,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=858813.3333333334, ans=0.2 2023-12-23 00:26:28,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-12-23 00:26:35,728 INFO [train.py:886] (3/4) Epoch 28, batch 150, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01597, audio_tagging_loss=0.01597, over 2633464.17 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:26:42,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=858880.0, ans=0.0 2023-12-23 00:26:55,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=12.0 2023-12-23 00:26:58,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=859013.3333333334, ans=0.125 2023-12-23 00:27:02,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.280e+01 3.440e+01 3.594e+01 4.041e+01, threshold=6.880e+01, percent-clipped=0.0 2023-12-23 00:27:07,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-23 00:27:27,346 INFO [train.py:886] (3/4) Epoch 28, batch 200, loss[loss=0.01597, audio_tagging_loss=0.01597, over 25000.00 frames. ], tot_loss[loss=0.01502, audio_tagging_loss=0.01502, over 3154365.18 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:27:27,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=859213.3333333334, ans=0.125 2023-12-23 00:27:31,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-23 00:27:36,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=859280.0, ans=0.125 2023-12-23 00:28:04,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-12-23 00:28:10,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=859480.0, ans=0.0 2023-12-23 00:28:17,872 INFO [train.py:886] (3/4) Epoch 28, batch 250, loss[loss=0.01171, audio_tagging_loss=0.01171, over 24750.00 frames. ], tot_loss[loss=0.01447, audio_tagging_loss=0.01447, over 3560624.23 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:28:30,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=859613.3333333334, ans=0.035 2023-12-23 00:28:31,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=859613.3333333334, ans=0.125 2023-12-23 00:28:33,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-12-23 00:28:43,837 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.198e+01 3.303e+01 3.432e+01 3.896e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 00:28:55,786 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:28:59,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=859813.3333333334, ans=0.07 2023-12-23 00:29:05,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-12-23 00:29:08,726 INFO [train.py:886] (3/4) Epoch 28, batch 300, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01398, audio_tagging_loss=0.01398, over 3866486.31 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:29:09,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=859880.0, ans=0.125 2023-12-23 00:29:19,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=859946.6666666666, ans=0.1 2023-12-23 00:29:25,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=859946.6666666666, ans=0.02 2023-12-23 00:29:35,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=860013.3333333334, ans=0.04949747468305833 2023-12-23 00:29:57,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=860146.6666666666, ans=0.2 2023-12-23 00:29:59,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=860146.6666666666, ans=0.0 2023-12-23 00:30:01,095 INFO [train.py:886] (3/4) Epoch 28, batch 350, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01388, audio_tagging_loss=0.01388, over 4107059.79 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:30:09,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=860213.3333333334, ans=0.035 2023-12-23 00:30:09,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=860213.3333333334, ans=0.125 2023-12-23 00:30:17,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860280.0, ans=0.1 2023-12-23 00:30:24,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=860346.6666666666, ans=0.0 2023-12-23 00:30:27,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=860346.6666666666, ans=15.0 2023-12-23 00:30:28,424 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.152e+01 3.318e+01 3.475e+01 4.174e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 00:30:40,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860413.3333333334, ans=0.1 2023-12-23 00:30:44,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=860480.0, ans=0.125 2023-12-23 00:30:52,540 INFO [train.py:886] (3/4) Epoch 28, batch 400, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01363, audio_tagging_loss=0.01363, over 4296536.12 frames. ], batch size: 99, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:31:04,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=860613.3333333334, ans=0.125 2023-12-23 00:31:15,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=860680.0, ans=0.0 2023-12-23 00:31:18,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=860680.0, ans=0.1 2023-12-23 00:31:27,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=860746.6666666666, ans=0.2 2023-12-23 00:31:28,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=860746.6666666666, ans=15.0 2023-12-23 00:31:44,315 INFO [train.py:886] (3/4) Epoch 28, batch 450, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4435971.32 frames. ], batch size: 100, lr: 3.89e-03, grad_scale: 32.0 2023-12-23 00:31:49,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=860880.0, ans=6.0 2023-12-23 00:32:11,563 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.702e+01 3.063e+01 3.224e+01 3.410e+01 3.950e+01, threshold=6.447e+01, percent-clipped=0.0 2023-12-23 00:32:16,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=861080.0, ans=0.125 2023-12-23 00:32:20,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861080.0, ans=0.1 2023-12-23 00:32:25,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-23 00:32:29,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-12-23 00:32:33,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-12-23 00:32:37,186 INFO [train.py:886] (3/4) Epoch 28, batch 500, loss[loss=0.01459, audio_tagging_loss=0.01459, over 24750.00 frames. ], tot_loss[loss=0.01314, audio_tagging_loss=0.01314, over 4550496.44 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:32:37,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-12-23 00:32:39,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861213.3333333334, ans=0.0 2023-12-23 00:32:42,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=861213.3333333334, ans=0.0 2023-12-23 00:33:01,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861346.6666666666, ans=0.0 2023-12-23 00:33:09,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-23 00:33:28,207 INFO [train.py:886] (3/4) Epoch 28, batch 550, loss[loss=0.01214, audio_tagging_loss=0.01214, over 21715.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4642374.26 frames. ], batch size: 107, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:33:31,354 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:33:54,955 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.118e+01 3.304e+01 3.442e+01 3.885e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:34:07,974 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:34:14,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=861813.3333333334, ans=0.5 2023-12-23 00:34:20,599 INFO [train.py:886] (3/4) Epoch 28, batch 600, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4710588.19 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:34:20,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=861880.0, ans=0.125 2023-12-23 00:34:27,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=861880.0, ans=0.2 2023-12-23 00:34:27,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=861880.0, ans=0.125 2023-12-23 00:34:34,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=861946.6666666666, ans=0.125 2023-12-23 00:34:35,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=861946.6666666666, ans=0.125 2023-12-23 00:34:40,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=862013.3333333334, ans=0.0 2023-12-23 00:34:44,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=862013.3333333334, ans=0.0 2023-12-23 00:34:54,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=862080.0, ans=0.1 2023-12-23 00:35:01,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862146.6666666666, ans=0.1 2023-12-23 00:35:07,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=862146.6666666666, ans=0.125 2023-12-23 00:35:07,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=862146.6666666666, ans=0.125 2023-12-23 00:35:12,857 INFO [train.py:886] (3/4) Epoch 28, batch 650, loss[loss=0.01394, audio_tagging_loss=0.01394, over 25000.00 frames. ], tot_loss[loss=0.01309, audio_tagging_loss=0.01309, over 4759470.34 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:35:20,486 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:35:38,965 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.866e+01 3.196e+01 3.336e+01 3.480e+01 3.806e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 00:35:46,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=862413.3333333334, ans=0.125 2023-12-23 00:35:55,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=862480.0, ans=0.125 2023-12-23 00:35:57,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=862480.0, ans=0.0 2023-12-23 00:36:03,740 INFO [train.py:886] (3/4) Epoch 28, batch 700, loss[loss=0.01091, audio_tagging_loss=0.01091, over 22270.00 frames. ], tot_loss[loss=0.01306, audio_tagging_loss=0.01306, over 4793736.95 frames. ], batch size: 107, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:36:18,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=862613.3333333334, ans=0.125 2023-12-23 00:36:32,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2023-12-23 00:36:55,332 INFO [train.py:886] (3/4) Epoch 28, batch 750, loss[loss=0.01606, audio_tagging_loss=0.01606, over 24750.00 frames. ], tot_loss[loss=0.01302, audio_tagging_loss=0.01302, over 4825211.27 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:36:56,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=862880.0, ans=0.125 2023-12-23 00:36:58,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=862880.0, ans=0.125 2023-12-23 00:37:12,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=862946.6666666666, ans=0.0 2023-12-23 00:37:14,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=863013.3333333334, ans=10.0 2023-12-23 00:37:23,194 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.734e+01 3.104e+01 3.286e+01 3.429e+01 3.776e+01, threshold=6.573e+01, percent-clipped=0.0 2023-12-23 00:37:25,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=863013.3333333334, ans=0.0 2023-12-23 00:37:31,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863080.0, ans=0.1 2023-12-23 00:37:45,967 INFO [train.py:886] (3/4) Epoch 28, batch 800, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4853045.99 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:37:46,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863213.3333333334, ans=0.125 2023-12-23 00:37:56,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-12-23 00:38:15,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=863346.6666666666, ans=0.5 2023-12-23 00:38:21,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=863413.3333333334, ans=0.125 2023-12-23 00:38:39,115 INFO [train.py:886] (3/4) Epoch 28, batch 850, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4878460.94 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:38:49,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=863613.3333333334, ans=0.125 2023-12-23 00:39:06,381 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.957e+01 3.161e+01 3.315e+01 3.496e+01 3.961e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 00:39:10,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=863746.6666666666, ans=0.125 2023-12-23 00:39:20,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2023-12-23 00:39:31,535 INFO [train.py:886] (3/4) Epoch 28, batch 900, loss[loss=0.0153, audio_tagging_loss=0.0153, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4900352.45 frames. ], batch size: 100, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:39:33,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=863880.0, ans=0.0 2023-12-23 00:39:37,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-12-23 00:39:40,813 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:39:54,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=864013.3333333334, ans=0.0 2023-12-23 00:40:07,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=864080.0, ans=0.0 2023-12-23 00:40:21,514 INFO [train.py:886] (3/4) Epoch 28, batch 950, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4910396.91 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:40:26,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=864213.3333333334, ans=0.2 2023-12-23 00:40:32,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:35,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=864280.0, ans=0.0 2023-12-23 00:40:40,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864280.0, ans=0.125 2023-12-23 00:40:41,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=864280.0, ans=0.0 2023-12-23 00:40:42,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864346.6666666666, ans=0.1 2023-12-23 00:40:44,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=864346.6666666666, ans=0.07 2023-12-23 00:40:48,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-12-23 00:40:48,394 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.699e+01 3.226e+01 3.347e+01 3.519e+01 4.208e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 00:41:02,431 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:41:04,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=864480.0, ans=0.125 2023-12-23 00:41:09,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=864480.0, ans=0.125 2023-12-23 00:41:14,084 INFO [train.py:886] (3/4) Epoch 28, batch 1000, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4916821.15 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:41:15,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=864546.6666666666, ans=0.125 2023-12-23 00:41:39,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864680.0, ans=0.1 2023-12-23 00:41:40,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=864680.0, ans=0.0 2023-12-23 00:41:47,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=864746.6666666666, ans=0.5 2023-12-23 00:41:51,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=864746.6666666666, ans=0.125 2023-12-23 00:41:55,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-12-23 00:42:05,069 INFO [train.py:886] (3/4) Epoch 28, batch 1050, loss[loss=0.01494, audio_tagging_loss=0.01494, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4921183.73 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:42:07,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=864880.0, ans=0.125 2023-12-23 00:42:14,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=864880.0, ans=10.0 2023-12-23 00:42:20,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=864946.6666666666, ans=0.125 2023-12-23 00:42:22,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=864946.6666666666, ans=0.125 2023-12-23 00:42:31,250 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.118e+01 3.211e+01 3.407e+01 4.036e+01, threshold=6.422e+01, percent-clipped=0.0 2023-12-23 00:42:50,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=865146.6666666666, ans=0.125 2023-12-23 00:42:56,961 INFO [train.py:886] (3/4) Epoch 28, batch 1100, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4926835.93 frames. ], batch size: 99, lr: 3.88e-03, grad_scale: 32.0 2023-12-23 00:43:11,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865280.0, ans=0.1 2023-12-23 00:43:12,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=865280.0, ans=0.0 2023-12-23 00:43:29,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=865413.3333333334, ans=0.0 2023-12-23 00:43:39,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2023-12-23 00:43:48,804 INFO [train.py:886] (3/4) Epoch 28, batch 1150, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4936056.43 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:43:48,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=865546.6666666666, ans=0.0 2023-12-23 00:43:53,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-12-23 00:43:59,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=865613.3333333334, ans=0.125 2023-12-23 00:44:14,775 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.161e+01 3.263e+01 3.387e+01 3.814e+01, threshold=6.527e+01, percent-clipped=0.0 2023-12-23 00:44:19,479 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:44:37,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-12-23 00:44:38,944 INFO [train.py:886] (3/4) Epoch 28, batch 1200, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4946693.79 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:44:51,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=865946.6666666666, ans=0.0 2023-12-23 00:44:53,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-12-23 00:45:24,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=866146.6666666666, ans=10.0 2023-12-23 00:45:30,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=866213.3333333334, ans=0.125 2023-12-23 00:45:30,752 INFO [train.py:886] (3/4) Epoch 28, batch 1250, loss[loss=0.0165, audio_tagging_loss=0.0165, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4946769.95 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:45:41,346 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:45:45,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=15.0 2023-12-23 00:45:56,740 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+01 3.240e+01 3.398e+01 3.546e+01 3.975e+01, threshold=6.796e+01, percent-clipped=0.0 2023-12-23 00:46:08,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=866413.3333333334, ans=0.09899494936611666 2023-12-23 00:46:21,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-12-23 00:46:21,500 INFO [train.py:886] (3/4) Epoch 28, batch 1300, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01311, audio_tagging_loss=0.01311, over 4947623.79 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:46:42,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-12-23 00:46:52,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=866746.6666666666, ans=0.2 2023-12-23 00:47:06,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=866813.3333333334, ans=0.2 2023-12-23 00:47:08,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=866813.3333333334, ans=0.025 2023-12-23 00:47:11,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=866880.0, ans=0.0 2023-12-23 00:47:12,302 INFO [train.py:886] (3/4) Epoch 28, batch 1350, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01304, audio_tagging_loss=0.01304, over 4948439.46 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:47:21,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866946.6666666666, ans=0.1 2023-12-23 00:47:26,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866946.6666666666, ans=0.1 2023-12-23 00:47:29,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=866946.6666666666, ans=0.125 2023-12-23 00:47:33,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=867013.3333333334, ans=0.2 2023-12-23 00:47:39,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.136e+01 3.279e+01 3.497e+01 4.059e+01, threshold=6.558e+01, percent-clipped=0.0 2023-12-23 00:47:51,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=867146.6666666666, ans=0.04949747468305833 2023-12-23 00:48:02,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2023-12-23 00:48:03,371 INFO [train.py:886] (3/4) Epoch 28, batch 1400, loss[loss=0.0134, audio_tagging_loss=0.0134, over 25000.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4941852.55 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:48:13,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-23 00:48:17,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=867280.0, ans=0.125 2023-12-23 00:48:21,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867280.0, ans=0.0 2023-12-23 00:48:23,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=867346.6666666666, ans=0.2 2023-12-23 00:48:34,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867413.3333333334, ans=0.1 2023-12-23 00:48:47,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=867480.0, ans=0.125 2023-12-23 00:48:54,803 INFO [train.py:886] (3/4) Epoch 28, batch 1450, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4949704.11 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:48:55,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-12-23 00:48:56,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=867546.6666666666, ans=0.0 2023-12-23 00:49:13,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-12-23 00:49:18,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=867680.0, ans=0.2 2023-12-23 00:49:21,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=867680.0, ans=0.0 2023-12-23 00:49:22,062 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.828e+01 3.154e+01 3.288e+01 3.472e+01 3.862e+01, threshold=6.577e+01, percent-clipped=0.0 2023-12-23 00:49:31,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.25 vs. limit=15.0 2023-12-23 00:49:33,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=867746.6666666666, ans=0.09899494936611666 2023-12-23 00:49:46,854 INFO [train.py:886] (3/4) Epoch 28, batch 1500, loss[loss=0.0147, audio_tagging_loss=0.0147, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4948296.90 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:49:47,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867880.0, ans=0.1 2023-12-23 00:49:48,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-12-23 00:49:51,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867880.0, ans=0.0 2023-12-23 00:49:53,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=867880.0, ans=0.125 2023-12-23 00:50:31,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=868146.6666666666, ans=0.125 2023-12-23 00:50:38,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=12.0 2023-12-23 00:50:38,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=12.0 2023-12-23 00:50:39,575 INFO [train.py:886] (3/4) Epoch 28, batch 1550, loss[loss=0.01706, audio_tagging_loss=0.01706, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4948097.66 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:51:04,678 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.216e+01 3.352e+01 3.508e+01 3.923e+01, threshold=6.705e+01, percent-clipped=0.0 2023-12-23 00:51:07,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-12-23 00:51:14,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=868413.3333333334, ans=0.125 2023-12-23 00:51:18,711 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:51:29,615 INFO [train.py:886] (3/4) Epoch 28, batch 1600, loss[loss=0.01403, audio_tagging_loss=0.01403, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4941441.45 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:51:47,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2023-12-23 00:51:58,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=868680.0, ans=0.125 2023-12-23 00:52:09,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=868746.6666666666, ans=0.125 2023-12-23 00:52:18,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868813.3333333334, ans=0.1 2023-12-23 00:52:20,932 INFO [train.py:886] (3/4) Epoch 28, batch 1650, loss[loss=0.01123, audio_tagging_loss=0.01123, over 24019.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4939822.98 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:52:32,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=868946.6666666666, ans=0.0 2023-12-23 00:52:47,481 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.836e+01 3.157e+01 3.317e+01 3.477e+01 3.959e+01, threshold=6.634e+01, percent-clipped=0.0 2023-12-23 00:52:52,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=869080.0, ans=0.0 2023-12-23 00:53:00,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-12-23 00:53:10,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=869213.3333333334, ans=0.0 2023-12-23 00:53:11,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-23 00:53:11,747 INFO [train.py:886] (3/4) Epoch 28, batch 1700, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24922.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4943535.61 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:53:22,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=869280.0, ans=0.0 2023-12-23 00:53:24,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=869280.0, ans=0.0 2023-12-23 00:53:32,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=869346.6666666666, ans=0.125 2023-12-23 00:53:40,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=869346.6666666666, ans=0.2 2023-12-23 00:53:42,001 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:53:47,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=869413.3333333334, ans=0.125 2023-12-23 00:53:55,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=869480.0, ans=0.0 2023-12-23 00:54:02,282 INFO [train.py:886] (3/4) Epoch 28, batch 1750, loss[loss=0.01377, audio_tagging_loss=0.01377, over 21933.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4940523.09 frames. ], batch size: 107, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:54:14,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=869613.3333333334, ans=0.0 2023-12-23 00:54:19,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=869613.3333333334, ans=0.5 2023-12-23 00:54:20,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=869613.3333333334, ans=0.0 2023-12-23 00:54:25,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=869680.0, ans=0.125 2023-12-23 00:54:28,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2023-12-23 00:54:29,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-12-23 00:54:29,677 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.178e+01 3.276e+01 3.403e+01 3.949e+01, threshold=6.552e+01, percent-clipped=0.0 2023-12-23 00:54:44,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=869813.3333333334, ans=0.2 2023-12-23 00:54:54,480 INFO [train.py:886] (3/4) Epoch 28, batch 1800, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4948367.70 frames. ], batch size: 100, lr: 3.87e-03, grad_scale: 32.0 2023-12-23 00:55:22,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 00:55:27,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870080.0, ans=0.0 2023-12-23 00:55:30,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=870080.0, ans=0.125 2023-12-23 00:55:43,834 INFO [train.py:886] (3/4) Epoch 28, batch 1850, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4944799.92 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:55:49,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=870213.3333333334, ans=0.0 2023-12-23 00:55:58,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=870280.0, ans=0.125 2023-12-23 00:56:04,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870346.6666666666, ans=0.125 2023-12-23 00:56:06,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=870346.6666666666, ans=0.09899494936611666 2023-12-23 00:56:11,269 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+01 3.215e+01 3.384e+01 3.493e+01 4.271e+01, threshold=6.769e+01, percent-clipped=0.0 2023-12-23 00:56:24,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-12-23 00:56:27,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=870480.0, ans=0.0 2023-12-23 00:56:36,138 INFO [train.py:886] (3/4) Epoch 28, batch 1900, loss[loss=0.01278, audio_tagging_loss=0.01278, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4943310.86 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:56:41,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=870546.6666666666, ans=0.125 2023-12-23 00:56:56,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=870680.0, ans=0.125 2023-12-23 00:56:57,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=870680.0, ans=0.0 2023-12-23 00:57:02,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-12-23 00:57:13,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.79 vs. limit=22.5 2023-12-23 00:57:13,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=870746.6666666666, ans=10.0 2023-12-23 00:57:28,663 INFO [train.py:886] (3/4) Epoch 28, batch 1950, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4944058.22 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2023-12-23 00:57:33,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=870880.0, ans=0.0 2023-12-23 00:57:36,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-23 00:57:44,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=870946.6666666666, ans=0.0 2023-12-23 00:57:48,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=871013.3333333334, ans=0.125 2023-12-23 00:57:53,806 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.143e+01 3.252e+01 3.432e+01 3.799e+01, threshold=6.504e+01, percent-clipped=0.0 2023-12-23 00:58:00,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-12-23 00:58:11,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=871146.6666666666, ans=0.125 2023-12-23 00:58:19,550 INFO [train.py:886] (3/4) Epoch 28, batch 2000, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4945652.03 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:58:21,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=871213.3333333334, ans=0.125 2023-12-23 00:58:24,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=871213.3333333334, ans=0.04949747468305833 2023-12-23 00:58:35,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871280.0, ans=0.1 2023-12-23 00:58:46,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=871346.6666666666, ans=0.125 2023-12-23 00:58:47,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=871346.6666666666, ans=0.125 2023-12-23 00:59:10,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871480.0, ans=0.1 2023-12-23 00:59:11,804 INFO [train.py:886] (3/4) Epoch 28, batch 2050, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4944499.20 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 00:59:17,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-12-23 00:59:22,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871613.3333333334, ans=0.1 2023-12-23 00:59:27,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2023-12-23 00:59:29,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=871613.3333333334, ans=0.125 2023-12-23 00:59:30,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=871680.0, ans=0.125 2023-12-23 00:59:38,686 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.794e+01 3.157e+01 3.303e+01 3.497e+01 3.860e+01, threshold=6.607e+01, percent-clipped=0.0 2023-12-23 00:59:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=871746.6666666666, ans=0.125 2023-12-23 00:59:51,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=871813.3333333334, ans=0.125 2023-12-23 00:59:53,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2023-12-23 00:59:56,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=871813.3333333334, ans=0.2 2023-12-23 00:59:58,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=871813.3333333334, ans=0.125 2023-12-23 01:00:01,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-12-23 01:00:02,002 INFO [train.py:886] (3/4) Epoch 28, batch 2100, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4944495.88 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:00:09,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=871880.0, ans=0.125 2023-12-23 01:00:11,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=871880.0, ans=0.125 2023-12-23 01:00:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=871946.6666666666, ans=0.025 2023-12-23 01:00:27,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=872013.3333333334, ans=0.125 2023-12-23 01:00:48,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=872146.6666666666, ans=0.07 2023-12-23 01:00:54,486 INFO [train.py:886] (3/4) Epoch 28, batch 2150, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4944241.98 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:21,865 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.753e+01 3.234e+01 3.354e+01 3.492e+01 4.264e+01, threshold=6.708e+01, percent-clipped=0.0 2023-12-23 01:01:22,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872346.6666666666, ans=0.1 2023-12-23 01:01:26,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=872413.3333333334, ans=0.0 2023-12-23 01:01:26,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2023-12-23 01:01:31,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.34 vs. limit=22.5 2023-12-23 01:01:34,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=872480.0, ans=0.125 2023-12-23 01:01:46,228 INFO [train.py:886] (3/4) Epoch 28, batch 2200, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4947340.13 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:01:49,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=872546.6666666666, ans=0.0 2023-12-23 01:01:53,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=872546.6666666666, ans=0.125 2023-12-23 01:02:11,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-12-23 01:02:21,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=872746.6666666666, ans=0.2 2023-12-23 01:02:25,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=872746.6666666666, ans=0.2 2023-12-23 01:02:26,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=872746.6666666666, ans=0.0 2023-12-23 01:02:30,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=872813.3333333334, ans=0.125 2023-12-23 01:02:35,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=872813.3333333334, ans=0.125 2023-12-23 01:02:38,063 INFO [train.py:886] (3/4) Epoch 28, batch 2250, loss[loss=0.01854, audio_tagging_loss=0.01854, over 24933.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4945262.31 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:02:49,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-12-23 01:03:04,779 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.837e+01 3.155e+01 3.335e+01 3.468e+01 4.219e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:03:06,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=873013.3333333334, ans=0.125 2023-12-23 01:03:13,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.74 vs. limit=10.0 2023-12-23 01:03:30,846 INFO [train.py:886] (3/4) Epoch 28, batch 2300, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4947176.54 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:03:30,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=873213.3333333334, ans=0.0 2023-12-23 01:03:39,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=873280.0, ans=0.125 2023-12-23 01:03:40,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2023-12-23 01:03:40,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=873280.0, ans=0.125 2023-12-23 01:03:49,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-12-23 01:04:08,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=873413.3333333334, ans=0.5 2023-12-23 01:04:22,665 INFO [train.py:886] (3/4) Epoch 28, batch 2350, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4946009.48 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:04:22,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=873546.6666666666, ans=0.125 2023-12-23 01:04:24,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=873546.6666666666, ans=0.125 2023-12-23 01:04:42,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-12-23 01:04:48,803 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.776e+01 3.130e+01 3.254e+01 3.391e+01 3.968e+01, threshold=6.508e+01, percent-clipped=0.0 2023-12-23 01:04:51,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873746.6666666666, ans=0.1 2023-12-23 01:04:59,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=873746.6666666666, ans=0.07 2023-12-23 01:05:00,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=873746.6666666666, ans=0.2 2023-12-23 01:05:02,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-12-23 01:05:13,635 INFO [train.py:886] (3/4) Epoch 28, batch 2400, loss[loss=0.01653, audio_tagging_loss=0.01653, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4946970.62 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:05:35,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-23 01:05:36,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=874013.3333333334, ans=0.125 2023-12-23 01:05:44,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=12.0 2023-12-23 01:05:54,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-12-23 01:06:00,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-23 01:06:03,956 INFO [train.py:886] (3/4) Epoch 28, batch 2450, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4953759.10 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 64.0 2023-12-23 01:06:05,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-12-23 01:06:14,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=874280.0, ans=0.125 2023-12-23 01:06:31,240 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.962e+01 3.185e+01 3.322e+01 3.528e+01 4.944e+01, threshold=6.645e+01, percent-clipped=0.0 2023-12-23 01:06:47,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874480.0, ans=0.1 2023-12-23 01:06:49,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=874480.0, ans=0.2 2023-12-23 01:06:55,585 INFO [train.py:886] (3/4) Epoch 28, batch 2500, loss[loss=0.0132, audio_tagging_loss=0.0132, over 24750.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4950417.47 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:07:09,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=874613.3333333334, ans=0.0 2023-12-23 01:07:15,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=874680.0, ans=0.125 2023-12-23 01:07:19,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=874680.0, ans=0.05 2023-12-23 01:07:20,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874680.0, ans=0.1 2023-12-23 01:07:26,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:31,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:33,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=874746.6666666666, ans=0.125 2023-12-23 01:07:33,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874746.6666666666, ans=0.1 2023-12-23 01:07:46,628 INFO [train.py:886] (3/4) Epoch 28, batch 2550, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 4944063.49 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:07:51,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=874880.0, ans=0.035 2023-12-23 01:08:14,113 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.213e+01 3.387e+01 3.518e+01 3.975e+01, threshold=6.773e+01, percent-clipped=0.0 2023-12-23 01:08:15,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=875013.3333333334, ans=0.2 2023-12-23 01:08:25,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=875080.0, ans=0.2 2023-12-23 01:08:38,600 INFO [train.py:886] (3/4) Epoch 28, batch 2600, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4947463.73 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:08:40,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2023-12-23 01:09:02,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=875346.6666666666, ans=0.0 2023-12-23 01:09:02,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-23 01:09:17,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=875413.3333333334, ans=0.125 2023-12-23 01:09:18,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=875413.3333333334, ans=0.125 2023-12-23 01:09:23,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=875480.0, ans=0.0 2023-12-23 01:09:30,269 INFO [train.py:886] (3/4) Epoch 28, batch 2650, loss[loss=0.01333, audio_tagging_loss=0.01333, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4946071.94 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:09:32,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=875546.6666666666, ans=0.025 2023-12-23 01:09:56,413 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.754e+01 3.093e+01 3.250e+01 3.446e+01 3.869e+01, threshold=6.500e+01, percent-clipped=0.0 2023-12-23 01:09:59,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=875746.6666666666, ans=0.125 2023-12-23 01:10:12,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=875813.3333333334, ans=0.0 2023-12-23 01:10:21,854 INFO [train.py:886] (3/4) Epoch 28, batch 2700, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4944346.32 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:10:36,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=875946.6666666666, ans=0.0 2023-12-23 01:10:40,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-12-23 01:10:46,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=876013.3333333334, ans=0.2 2023-12-23 01:10:50,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=876013.3333333334, ans=0.125 2023-12-23 01:10:55,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=876080.0, ans=0.1 2023-12-23 01:10:58,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=876080.0, ans=0.04949747468305833 2023-12-23 01:11:10,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876146.6666666666, ans=0.1 2023-12-23 01:11:12,603 INFO [train.py:886] (3/4) Epoch 28, batch 2750, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4941007.28 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:11:12,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=876213.3333333334, ans=0.125 2023-12-23 01:11:19,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-12-23 01:11:20,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=876213.3333333334, ans=0.125 2023-12-23 01:11:33,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=876346.6666666666, ans=0.0 2023-12-23 01:11:39,279 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.198e+01 3.346e+01 3.453e+01 3.797e+01, threshold=6.692e+01, percent-clipped=0.0 2023-12-23 01:11:41,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-12-23 01:11:43,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876413.3333333334, ans=0.1 2023-12-23 01:11:45,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=876413.3333333334, ans=0.125 2023-12-23 01:12:04,151 INFO [train.py:886] (3/4) Epoch 28, batch 2800, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4939305.13 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:12:05,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.52 vs. limit=5.0 2023-12-23 01:12:06,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-12-23 01:12:09,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=876546.6666666666, ans=0.0 2023-12-23 01:12:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=876613.3333333334, ans=0.5 2023-12-23 01:12:31,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876680.0, ans=0.125 2023-12-23 01:12:34,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-12-23 01:12:42,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=876746.6666666666, ans=0.125 2023-12-23 01:12:53,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=876813.3333333334, ans=0.0 2023-12-23 01:12:56,780 INFO [train.py:886] (3/4) Epoch 28, batch 2850, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4939024.04 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:12:58,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=876880.0, ans=0.09899494936611666 2023-12-23 01:13:00,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=876880.0, ans=0.0 2023-12-23 01:13:15,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=877013.3333333334, ans=0.04949747468305833 2023-12-23 01:13:23,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=877013.3333333334, ans=0.125 2023-12-23 01:13:23,945 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.205e+01 3.349e+01 3.516e+01 3.954e+01, threshold=6.699e+01, percent-clipped=0.0 2023-12-23 01:13:27,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=877080.0, ans=0.2 2023-12-23 01:13:40,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-12-23 01:13:41,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=877146.6666666666, ans=0.2 2023-12-23 01:13:47,397 INFO [train.py:886] (3/4) Epoch 28, batch 2900, loss[loss=0.01222, audio_tagging_loss=0.01222, over 23997.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4937563.33 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:14:14,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=877346.6666666666, ans=0.125 2023-12-23 01:14:41,105 INFO [train.py:886] (3/4) Epoch 28, batch 2950, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4938547.23 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:14:41,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=877546.6666666666, ans=0.0 2023-12-23 01:14:48,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-12-23 01:14:55,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=877613.3333333334, ans=0.0 2023-12-23 01:15:08,593 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.109e+01 3.285e+01 3.424e+01 3.829e+01, threshold=6.571e+01, percent-clipped=0.0 2023-12-23 01:15:16,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=877746.6666666666, ans=0.125 2023-12-23 01:15:19,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=877746.6666666666, ans=0.2 2023-12-23 01:15:33,433 INFO [train.py:886] (3/4) Epoch 28, batch 3000, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4943057.87 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:15:33,434 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 01:15:54,361 INFO [train.py:917] (3/4) Epoch 28, validation: loss=0.03338, audio_tagging_loss=0.03338, over 3737520.00 frames. 2023-12-23 01:15:54,361 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 01:16:18,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=878013.3333333334, ans=0.0 2023-12-23 01:16:19,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=878013.3333333334, ans=0.2 2023-12-23 01:16:35,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=878146.6666666666, ans=0.125 2023-12-23 01:16:46,342 INFO [train.py:886] (3/4) Epoch 28, batch 3050, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4947982.82 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:17:14,049 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.142e+01 3.296e+01 3.491e+01 4.100e+01, threshold=6.591e+01, percent-clipped=0.0 2023-12-23 01:17:22,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-12-23 01:17:24,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=878413.3333333334, ans=0.125 2023-12-23 01:17:34,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-12-23 01:17:37,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=878546.6666666666, ans=0.2 2023-12-23 01:17:38,116 INFO [train.py:886] (3/4) Epoch 28, batch 3100, loss[loss=0.01492, audio_tagging_loss=0.01492, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4952819.55 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:17:55,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2023-12-23 01:18:03,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=878680.0, ans=0.2 2023-12-23 01:18:06,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=878680.0, ans=0.125 2023-12-23 01:18:10,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=878746.6666666666, ans=0.125 2023-12-23 01:18:29,893 INFO [train.py:886] (3/4) Epoch 28, batch 3150, loss[loss=0.0123, audio_tagging_loss=0.0123, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4947683.27 frames. ], batch size: 99, lr: 3.85e-03, grad_scale: 64.0 2023-12-23 01:18:38,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-12-23 01:18:53,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-12-23 01:18:57,079 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.894e+01 3.254e+01 3.356e+01 3.478e+01 4.076e+01, threshold=6.712e+01, percent-clipped=0.0 2023-12-23 01:19:00,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=879080.0, ans=0.125 2023-12-23 01:19:14,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=879146.6666666666, ans=0.125 2023-12-23 01:19:15,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-12-23 01:19:22,609 INFO [train.py:886] (3/4) Epoch 28, batch 3200, loss[loss=0.01271, audio_tagging_loss=0.01271, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4948031.68 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:19:22,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879213.3333333334, ans=0.1 2023-12-23 01:19:24,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=879213.3333333334, ans=0.125 2023-12-23 01:19:25,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879213.3333333334, ans=0.1 2023-12-23 01:19:29,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879213.3333333334, ans=0.1 2023-12-23 01:19:29,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2023-12-23 01:19:52,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-12-23 01:20:13,633 INFO [train.py:886] (3/4) Epoch 28, batch 3250, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4953665.19 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:20:16,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=879546.6666666666, ans=0.125 2023-12-23 01:20:27,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=879613.3333333334, ans=0.07 2023-12-23 01:20:37,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=879680.0, ans=0.09899494936611666 2023-12-23 01:20:40,474 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.146e+01 3.294e+01 3.408e+01 4.034e+01, threshold=6.589e+01, percent-clipped=0.0 2023-12-23 01:20:40,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879680.0, ans=0.0 2023-12-23 01:20:44,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=879746.6666666666, ans=0.1 2023-12-23 01:20:56,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=879813.3333333334, ans=0.125 2023-12-23 01:21:01,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=879813.3333333334, ans=0.0 2023-12-23 01:21:05,287 INFO [train.py:886] (3/4) Epoch 28, batch 3300, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4949601.17 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:21:21,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-12-23 01:21:22,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-12-23 01:21:27,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=880013.3333333334, ans=0.2 2023-12-23 01:21:28,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=12.0 2023-12-23 01:21:31,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=880013.3333333334, ans=0.125 2023-12-23 01:21:40,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880080.0, ans=0.1 2023-12-23 01:21:43,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=880080.0, ans=0.125 2023-12-23 01:21:43,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=880080.0, ans=0.125 2023-12-23 01:21:48,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=880146.6666666666, ans=0.125 2023-12-23 01:21:59,416 INFO [train.py:886] (3/4) Epoch 28, batch 3350, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4954505.24 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:22:00,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=880213.3333333334, ans=0.125 2023-12-23 01:22:10,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=880280.0, ans=0.0 2023-12-23 01:22:18,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=880280.0, ans=0.125 2023-12-23 01:22:18,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=880346.6666666666, ans=0.125 2023-12-23 01:22:26,187 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.722e+01 3.184e+01 3.328e+01 3.464e+01 4.025e+01, threshold=6.657e+01, percent-clipped=0.0 2023-12-23 01:22:40,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=880480.0, ans=0.125 2023-12-23 01:22:49,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=880480.0, ans=0.0 2023-12-23 01:22:50,983 INFO [train.py:886] (3/4) Epoch 28, batch 3400, loss[loss=0.01509, audio_tagging_loss=0.01509, over 25000.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4956131.50 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:23:03,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=880613.3333333334, ans=0.125 2023-12-23 01:23:09,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=12.0 2023-12-23 01:23:18,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-12-23 01:23:19,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-12-23 01:23:41,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=880813.3333333334, ans=15.0 2023-12-23 01:23:43,682 INFO [train.py:886] (3/4) Epoch 28, batch 3450, loss[loss=0.01234, audio_tagging_loss=0.01234, over 21769.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4947636.43 frames. ], batch size: 107, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:23:49,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=880880.0, ans=0.035 2023-12-23 01:23:52,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=880946.6666666666, ans=0.1 2023-12-23 01:24:10,419 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.236e+01 3.395e+01 3.566e+01 3.957e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 01:24:12,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=881013.3333333334, ans=0.2 2023-12-23 01:24:35,513 INFO [train.py:886] (3/4) Epoch 28, batch 3500, loss[loss=0.01238, audio_tagging_loss=0.01238, over 24750.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4942797.07 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:24:41,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=881213.3333333334, ans=0.125 2023-12-23 01:24:52,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=881280.0, ans=0.125 2023-12-23 01:25:03,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-12-23 01:25:03,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=15.0 2023-12-23 01:25:13,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=881413.3333333334, ans=0.125 2023-12-23 01:25:27,303 INFO [train.py:886] (3/4) Epoch 28, batch 3550, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4939927.84 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:25:28,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881546.6666666666, ans=0.125 2023-12-23 01:25:41,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=881613.3333333334, ans=0.2 2023-12-23 01:25:54,591 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.124e+01 3.262e+01 3.428e+01 4.045e+01, threshold=6.525e+01, percent-clipped=0.0 2023-12-23 01:25:54,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=881680.0, ans=0.125 2023-12-23 01:26:04,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=881746.6666666666, ans=0.2 2023-12-23 01:26:19,133 INFO [train.py:886] (3/4) Epoch 28, batch 3600, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4946800.99 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:26:20,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-12-23 01:26:37,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-23 01:26:43,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.57 vs. limit=5.0 2023-12-23 01:26:57,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=882080.0, ans=0.2 2023-12-23 01:27:05,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2023-12-23 01:27:08,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=882146.6666666666, ans=0.125 2023-12-23 01:27:10,015 INFO [train.py:886] (3/4) Epoch 28, batch 3650, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4948941.50 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:27:19,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=882213.3333333334, ans=0.2 2023-12-23 01:27:22,699 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:27:25,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-12-23 01:27:36,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:37,443 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.830e+01 3.117e+01 3.248e+01 3.424e+01 3.950e+01, threshold=6.496e+01, percent-clipped=0.0 2023-12-23 01:27:39,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:39,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=882346.6666666666, ans=0.125 2023-12-23 01:27:43,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=882413.3333333334, ans=0.1 2023-12-23 01:27:46,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=882413.3333333334, ans=0.1 2023-12-23 01:27:49,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=882413.3333333334, ans=0.05 2023-12-23 01:27:55,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=882480.0, ans=0.0 2023-12-23 01:28:01,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=882546.6666666666, ans=0.0 2023-12-23 01:28:02,355 INFO [train.py:886] (3/4) Epoch 28, batch 3700, loss[loss=0.01655, audio_tagging_loss=0.01655, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4952633.42 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:10,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-12-23 01:28:16,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=882613.3333333334, ans=0.125 2023-12-23 01:28:19,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=882613.3333333334, ans=0.2 2023-12-23 01:28:23,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=882680.0, ans=0.125 2023-12-23 01:28:45,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=882813.3333333334, ans=0.125 2023-12-23 01:28:51,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=882813.3333333334, ans=0.125 2023-12-23 01:28:54,360 INFO [train.py:886] (3/4) Epoch 28, batch 3750, loss[loss=0.01353, audio_tagging_loss=0.01353, over 21050.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4948483.63 frames. ], batch size: 107, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:28:59,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=882880.0, ans=0.125 2023-12-23 01:29:06,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=882946.6666666666, ans=10.0 2023-12-23 01:29:21,069 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.212e+01 3.329e+01 3.465e+01 4.032e+01, threshold=6.658e+01, percent-clipped=0.0 2023-12-23 01:29:22,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883013.3333333334, ans=0.125 2023-12-23 01:29:29,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-23 01:29:39,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=883146.6666666666, ans=0.07 2023-12-23 01:29:45,892 INFO [train.py:886] (3/4) Epoch 28, batch 3800, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4950497.22 frames. ], batch size: 99, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:30:13,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=883346.6666666666, ans=0.0 2023-12-23 01:30:13,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=883346.6666666666, ans=0.125 2023-12-23 01:30:37,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=883546.6666666666, ans=0.025 2023-12-23 01:30:38,183 INFO [train.py:886] (3/4) Epoch 28, batch 3850, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4947000.14 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 64.0 2023-12-23 01:30:43,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883546.6666666666, ans=0.1 2023-12-23 01:30:58,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-12-23 01:30:59,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=883680.0, ans=0.125 2023-12-23 01:31:05,057 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.788e+01 3.155e+01 3.335e+01 3.521e+01 4.328e+01, threshold=6.670e+01, percent-clipped=0.0 2023-12-23 01:31:06,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=883680.0, ans=0.0 2023-12-23 01:31:06,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2023-12-23 01:31:11,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=883746.6666666666, ans=0.125 2023-12-23 01:31:13,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=883746.6666666666, ans=0.125 2023-12-23 01:31:29,956 INFO [train.py:886] (3/4) Epoch 28, batch 3900, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4949173.66 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:31:34,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=883880.0, ans=0.2 2023-12-23 01:31:47,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=883946.6666666666, ans=10.0 2023-12-23 01:31:50,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=884013.3333333334, ans=0.125 2023-12-23 01:32:10,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884080.0, ans=0.1 2023-12-23 01:32:21,903 INFO [train.py:886] (3/4) Epoch 28, batch 3950, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4945210.72 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:32:49,270 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.873e+01 3.204e+01 3.330e+01 3.435e+01 3.816e+01, threshold=6.660e+01, percent-clipped=0.0 2023-12-23 01:32:49,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=884346.6666666666, ans=0.2 2023-12-23 01:32:50,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2023-12-23 01:33:12,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=884480.0, ans=0.125 2023-12-23 01:33:13,909 INFO [train.py:886] (3/4) Epoch 28, batch 4000, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4952059.93 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 128.0 2023-12-23 01:33:25,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-12-23 01:33:38,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=884680.0, ans=0.0 2023-12-23 01:34:03,576 INFO [train.py:886] (3/4) Epoch 28, batch 4050, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4955163.58 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:34:11,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=884880.0, ans=0.07 2023-12-23 01:34:24,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=885013.3333333334, ans=0.2 2023-12-23 01:34:24,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=885013.3333333334, ans=0.04949747468305833 2023-12-23 01:34:27,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=885013.3333333334, ans=0.125 2023-12-23 01:34:30,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=885013.3333333334, ans=0.0 2023-12-23 01:34:31,372 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.870e+01 3.168e+01 3.309e+01 3.446e+01 3.884e+01, threshold=6.618e+01, percent-clipped=0.0 2023-12-23 01:34:55,293 INFO [train.py:886] (3/4) Epoch 28, batch 4100, loss[loss=0.01567, audio_tagging_loss=0.01567, over 24750.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4951832.72 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:35:16,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=885346.6666666666, ans=0.0 2023-12-23 01:35:18,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-12-23 01:35:27,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=885413.3333333334, ans=0.0 2023-12-23 01:35:33,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2023-12-23 01:35:33,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=885413.3333333334, ans=0.125 2023-12-23 01:35:39,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-12-23 01:35:46,238 INFO [train.py:886] (3/4) Epoch 28, batch 4150, loss[loss=0.01054, audio_tagging_loss=0.01054, over 24058.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 4953917.21 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:35:46,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=885546.6666666666, ans=0.2 2023-12-23 01:35:47,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885546.6666666666, ans=0.1 2023-12-23 01:36:12,414 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.213e+01 3.310e+01 3.521e+01 4.267e+01, threshold=6.621e+01, percent-clipped=0.0 2023-12-23 01:36:15,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-12-23 01:36:37,006 INFO [train.py:886] (3/4) Epoch 28, batch 4200, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4955826.58 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:36:54,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=885946.6666666666, ans=0.125 2023-12-23 01:37:02,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=886013.3333333334, ans=0.125 2023-12-23 01:37:04,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=886013.3333333334, ans=0.0 2023-12-23 01:37:10,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=886080.0, ans=0.125 2023-12-23 01:37:10,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=886080.0, ans=0.125 2023-12-23 01:37:13,428 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-12-23 01:37:24,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=886146.6666666666, ans=0.125 2023-12-23 01:37:26,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=886146.6666666666, ans=0.0 2023-12-23 01:37:29,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=886213.3333333334, ans=0.0 2023-12-23 01:37:29,995 INFO [train.py:886] (3/4) Epoch 28, batch 4250, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4960634.47 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:37:37,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-12-23 01:37:43,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=886280.0, ans=0.0 2023-12-23 01:37:44,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=886280.0, ans=0.125 2023-12-23 01:37:48,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=886346.6666666666, ans=0.125 2023-12-23 01:37:57,037 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.165e+01 3.374e+01 3.516e+01 4.346e+01, threshold=6.749e+01, percent-clipped=0.0 2023-12-23 01:37:59,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886413.3333333334, ans=0.1 2023-12-23 01:38:03,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-23 01:38:13,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=886480.0, ans=0.2 2023-12-23 01:38:20,007 INFO [train.py:886] (3/4) Epoch 28, batch 4300, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4961693.32 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:38:37,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=886613.3333333334, ans=0.0 2023-12-23 01:38:45,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2023-12-23 01:38:58,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886746.6666666666, ans=0.1 2023-12-23 01:38:59,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=886746.6666666666, ans=0.125 2023-12-23 01:39:13,172 INFO [train.py:886] (3/4) Epoch 28, batch 4350, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4966868.86 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:39:22,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=886946.6666666666, ans=0.0 2023-12-23 01:39:41,273 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.916e+01 3.191e+01 3.325e+01 3.447e+01 4.229e+01, threshold=6.650e+01, percent-clipped=0.0 2023-12-23 01:39:44,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=887080.0, ans=0.125 2023-12-23 01:39:44,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=887080.0, ans=0.125 2023-12-23 01:39:45,286 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:39:54,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=887146.6666666666, ans=0.2 2023-12-23 01:39:58,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=887146.6666666666, ans=0.125 2023-12-23 01:40:03,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=887213.3333333334, ans=0.125 2023-12-23 01:40:04,481 INFO [train.py:886] (3/4) Epoch 28, batch 4400, loss[loss=0.01305, audio_tagging_loss=0.01305, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4958111.12 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:40:13,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=887280.0, ans=0.07 2023-12-23 01:40:14,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-12-23 01:40:16,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=887280.0, ans=0.2 2023-12-23 01:40:23,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=887346.6666666666, ans=0.125 2023-12-23 01:40:23,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=887346.6666666666, ans=0.125 2023-12-23 01:40:37,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887413.3333333334, ans=0.1 2023-12-23 01:40:55,223 INFO [train.py:886] (3/4) Epoch 28, batch 4450, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01286, audio_tagging_loss=0.01286, over 4955665.90 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:40:55,399 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:41:19,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=887680.0, ans=0.0 2023-12-23 01:41:23,619 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.227e+01 3.347e+01 3.540e+01 3.940e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 01:41:26,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2023-12-23 01:41:48,286 INFO [train.py:886] (3/4) Epoch 28, batch 4500, loss[loss=0.01423, audio_tagging_loss=0.01423, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4958198.70 frames. ], batch size: 99, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:41:50,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=887880.0, ans=0.0 2023-12-23 01:41:51,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=887880.0, ans=0.0 2023-12-23 01:41:52,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=887880.0, ans=0.2 2023-12-23 01:42:04,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=887946.6666666666, ans=0.0 2023-12-23 01:42:26,424 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.788e-03 2023-12-23 01:42:31,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=888146.6666666666, ans=0.125 2023-12-23 01:42:38,510 INFO [train.py:886] (3/4) Epoch 28, batch 4550, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4956294.42 frames. ], batch size: 100, lr: 3.83e-03, grad_scale: 64.0 2023-12-23 01:42:38,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=888213.3333333334, ans=0.0 2023-12-23 01:42:47,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=888213.3333333334, ans=0.125 2023-12-23 01:43:06,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=888346.6666666666, ans=0.125 2023-12-23 01:43:06,886 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+01 3.196e+01 3.308e+01 3.514e+01 4.018e+01, threshold=6.616e+01, percent-clipped=0.0 2023-12-23 01:43:19,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-12-23 01:43:27,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=888480.0, ans=0.0 2023-12-23 01:43:31,507 INFO [train.py:886] (3/4) Epoch 28, batch 4600, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24907.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4956913.63 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:43:31,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888546.6666666666, ans=0.1 2023-12-23 01:43:34,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=888546.6666666666, ans=0.1 2023-12-23 01:43:39,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2023-12-23 01:43:49,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=888613.3333333334, ans=0.0 2023-12-23 01:44:01,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=888746.6666666666, ans=0.0 2023-12-23 01:44:03,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=888746.6666666666, ans=0.125 2023-12-23 01:44:13,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=888813.3333333334, ans=0.125 2023-12-23 01:44:16,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=888813.3333333334, ans=0.125 2023-12-23 01:44:23,030 INFO [train.py:886] (3/4) Epoch 28, batch 4650, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4961090.72 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:44:23,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=888880.0, ans=0.0 2023-12-23 01:44:23,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=888880.0, ans=0.125 2023-12-23 01:44:25,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=888880.0, ans=0.125 2023-12-23 01:44:50,547 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.857e+01 3.209e+01 3.315e+01 3.492e+01 4.117e+01, threshold=6.630e+01, percent-clipped=0.0 2023-12-23 01:44:56,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889080.0, ans=0.1 2023-12-23 01:45:13,970 INFO [train.py:886] (3/4) Epoch 28, batch 4700, loss[loss=0.01775, audio_tagging_loss=0.01775, over 24942.00 frames. ], tot_loss[loss=0.0129, audio_tagging_loss=0.0129, over 4959615.12 frames. ], batch size: 100, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:45:14,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-23 01:45:19,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=889213.3333333334, ans=0.125 2023-12-23 01:45:21,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=889213.3333333334, ans=0.125 2023-12-23 01:45:22,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2023-12-23 01:45:23,298 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:45:35,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-12-23 01:45:39,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-12-23 01:46:00,586 INFO [train.py:886] (3/4) Epoch 28, batch 4750, loss[loss=0.01336, audio_tagging_loss=0.01336, over 24750.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4951057.85 frames. ], batch size: 99, lr: 3.82e-03, grad_scale: 64.0 2023-12-23 01:46:04,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=889546.6666666666, ans=0.125 2023-12-23 01:46:36,938 INFO [train.py:886] (3/4) Epoch 29, batch 0, loss[loss=0.02515, audio_tagging_loss=0.02515, over 25000.00 frames. ], tot_loss[loss=0.02515, audio_tagging_loss=0.02515, over 25000.00 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:46:36,938 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 01:46:48,440 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2533, 3.5040, 3.6636, 3.5310], device='cuda:3') 2023-12-23 01:46:49,761 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7043, 3.3897, 3.8608, 3.8806], device='cuda:3') 2023-12-23 01:46:58,158 INFO [train.py:917] (3/4) Epoch 29, validation: loss=0.03319, audio_tagging_loss=0.03319, over 3737520.00 frames. 2023-12-23 01:46:58,159 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 01:47:10,288 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.242e+01 3.406e+01 3.707e+01 9.005e+01, threshold=6.813e+01, percent-clipped=9.0 2023-12-23 01:47:20,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-12-23 01:47:49,189 INFO [train.py:886] (3/4) Epoch 29, batch 50, loss[loss=0.01649, audio_tagging_loss=0.01649, over 25000.00 frames. ], tot_loss[loss=0.01979, audio_tagging_loss=0.01979, over 1118577.56 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:47:54,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=889986.6666666666, ans=0.125 2023-12-23 01:48:11,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=890120.0, ans=0.0 2023-12-23 01:48:12,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2023-12-23 01:48:17,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=890120.0, ans=0.0 2023-12-23 01:48:17,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890120.0, ans=0.1 2023-12-23 01:48:20,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-12-23 01:48:37,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=890253.3333333334, ans=0.0 2023-12-23 01:48:41,283 INFO [train.py:886] (3/4) Epoch 29, batch 100, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01716, audio_tagging_loss=0.01716, over 1970359.64 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:48:41,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=890320.0, ans=0.0 2023-12-23 01:48:46,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=890320.0, ans=0.125 2023-12-23 01:48:53,292 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.674e+01 3.939e+01 4.263e+01 5.538e+01, threshold=7.878e+01, percent-clipped=0.0 2023-12-23 01:49:05,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=890453.3333333334, ans=0.0 2023-12-23 01:49:13,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=890520.0, ans=0.125 2023-12-23 01:49:13,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2023-12-23 01:49:15,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=890520.0, ans=0.0 2023-12-23 01:49:32,050 INFO [train.py:886] (3/4) Epoch 29, batch 150, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01561, audio_tagging_loss=0.01561, over 2633156.86 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:50:20,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=890920.0, ans=0.0 2023-12-23 01:50:24,568 INFO [train.py:886] (3/4) Epoch 29, batch 200, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01462, audio_tagging_loss=0.01462, over 3145334.20 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:50:36,599 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.903e+01 3.220e+01 3.343e+01 3.508e+01 4.197e+01, threshold=6.685e+01, percent-clipped=0.0 2023-12-23 01:50:39,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=891053.3333333334, ans=0.0 2023-12-23 01:50:40,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=891053.3333333334, ans=0.125 2023-12-23 01:50:54,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891186.6666666666, ans=0.1 2023-12-23 01:51:16,782 INFO [train.py:886] (3/4) Epoch 29, batch 250, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01412, audio_tagging_loss=0.01412, over 3548922.34 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:51:24,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-12-23 01:51:28,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=891386.6666666666, ans=0.2 2023-12-23 01:51:34,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=891386.6666666666, ans=0.0 2023-12-23 01:51:45,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=891453.3333333334, ans=0.125 2023-12-23 01:51:54,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=891520.0, ans=0.2 2023-12-23 01:52:03,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=891586.6666666666, ans=0.125 2023-12-23 01:52:08,235 INFO [train.py:886] (3/4) Epoch 29, batch 300, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01394, audio_tagging_loss=0.01394, over 3856772.78 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:52:20,961 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.835e+01 3.236e+01 3.377e+01 3.518e+01 3.995e+01, threshold=6.753e+01, percent-clipped=0.0 2023-12-23 01:52:34,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=891786.6666666666, ans=0.04949747468305833 2023-12-23 01:52:34,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=891786.6666666666, ans=0.025 2023-12-23 01:52:35,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=891786.6666666666, ans=0.95 2023-12-23 01:52:39,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2023-12-23 01:52:50,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=891920.0, ans=0.0 2023-12-23 01:53:00,220 INFO [train.py:886] (3/4) Epoch 29, batch 350, loss[loss=0.008893, audio_tagging_loss=0.008893, over 24750.00 frames. ], tot_loss[loss=0.01367, audio_tagging_loss=0.01367, over 4097188.78 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:36,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=892186.6666666666, ans=0.0 2023-12-23 01:53:51,858 INFO [train.py:886] (3/4) Epoch 29, batch 400, loss[loss=0.0109, audio_tagging_loss=0.0109, over 22417.00 frames. ], tot_loss[loss=0.01336, audio_tagging_loss=0.01336, over 4285915.92 frames. ], batch size: 107, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:53:51,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892320.0, ans=0.1 2023-12-23 01:54:00,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892320.0, ans=0.1 2023-12-23 01:54:04,654 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.891e+01 3.197e+01 3.293e+01 3.447e+01 4.008e+01, threshold=6.587e+01, percent-clipped=0.0 2023-12-23 01:54:04,936 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:54:05,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=892386.6666666666, ans=0.125 2023-12-23 01:54:05,877 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:54:06,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=892386.6666666666, ans=0.125 2023-12-23 01:54:12,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=892453.3333333334, ans=0.0 2023-12-23 01:54:13,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-12-23 01:54:17,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892453.3333333334, ans=0.1 2023-12-23 01:54:21,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2023-12-23 01:54:39,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=892586.6666666666, ans=0.2 2023-12-23 01:54:39,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=892586.6666666666, ans=0.0 2023-12-23 01:54:43,567 INFO [train.py:886] (3/4) Epoch 29, batch 450, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.0131, audio_tagging_loss=0.0131, over 4431125.57 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:13,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=892853.3333333334, ans=0.0 2023-12-23 01:55:35,274 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:55:36,732 INFO [train.py:886] (3/4) Epoch 29, batch 500, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4549947.83 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:55:42,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-23 01:55:48,272 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.779e+01 3.131e+01 3.260e+01 3.424e+01 4.258e+01, threshold=6.520e+01, percent-clipped=0.0 2023-12-23 01:56:03,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=893120.0, ans=0.125 2023-12-23 01:56:05,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=893120.0, ans=0.0 2023-12-23 01:56:06,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=893186.6666666666, ans=0.125 2023-12-23 01:56:10,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-12-23 01:56:15,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893186.6666666666, ans=0.125 2023-12-23 01:56:16,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-12-23 01:56:27,081 INFO [train.py:886] (3/4) Epoch 29, batch 550, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24053.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4639287.59 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:56:37,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=893320.0, ans=0.125 2023-12-23 01:56:38,295 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:56:56,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2023-12-23 01:56:58,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=893520.0, ans=0.2 2023-12-23 01:57:01,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=893520.0, ans=0.2 2023-12-23 01:57:19,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=893653.3333333334, ans=0.0 2023-12-23 01:57:20,498 INFO [train.py:886] (3/4) Epoch 29, batch 600, loss[loss=0.0151, audio_tagging_loss=0.0151, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4707067.77 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:57:24,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=893653.3333333334, ans=0.0 2023-12-23 01:57:31,798 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.207e+01 3.346e+01 3.493e+01 4.560e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 01:57:45,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=893786.6666666666, ans=0.07 2023-12-23 01:57:59,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=893853.3333333334, ans=0.125 2023-12-23 01:58:12,567 INFO [train.py:886] (3/4) Epoch 29, batch 650, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24057.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4754066.05 frames. ], batch size: 100, lr: 3.75e-03, grad_scale: 32.0 2023-12-23 01:58:34,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=894120.0, ans=0.2 2023-12-23 01:58:53,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=894253.3333333334, ans=0.125 2023-12-23 01:59:03,459 INFO [train.py:886] (3/4) Epoch 29, batch 700, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4798146.64 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 01:59:03,688 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 01:59:16,885 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.181e+01 3.381e+01 3.506e+01 3.965e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 01:59:19,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=894386.6666666666, ans=0.1 2023-12-23 01:59:40,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-12-23 01:59:45,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=894586.6666666666, ans=0.125 2023-12-23 01:59:48,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=894586.6666666666, ans=0.07 2023-12-23 01:59:56,269 INFO [train.py:886] (3/4) Epoch 29, batch 750, loss[loss=0.01048, audio_tagging_loss=0.01048, over 22311.00 frames. ], tot_loss[loss=0.01282, audio_tagging_loss=0.01282, over 4822940.26 frames. ], batch size: 107, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:00:01,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=894653.3333333334, ans=0.125 2023-12-23 02:00:10,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=894720.0, ans=0.0 2023-12-23 02:00:18,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894786.6666666666, ans=0.1 2023-12-23 02:00:39,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894920.0, ans=0.125 2023-12-23 02:00:44,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=12.0 2023-12-23 02:00:46,092 INFO [train.py:886] (3/4) Epoch 29, batch 800, loss[loss=0.01432, audio_tagging_loss=0.01432, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4853769.80 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:00:49,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894986.6666666666, ans=0.125 2023-12-23 02:00:55,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-12-23 02:00:57,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=895053.3333333334, ans=0.125 2023-12-23 02:00:59,625 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.158e+01 3.303e+01 3.490e+01 4.206e+01, threshold=6.605e+01, percent-clipped=0.0 2023-12-23 02:01:04,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-23 02:01:38,396 INFO [train.py:886] (3/4) Epoch 29, batch 850, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4881881.63 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:01:41,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=895320.0, ans=0.95 2023-12-23 02:01:58,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-12-23 02:02:17,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=895520.0, ans=0.125 2023-12-23 02:02:29,866 INFO [train.py:886] (3/4) Epoch 29, batch 900, loss[loss=0.01531, audio_tagging_loss=0.01531, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4898805.67 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:02:36,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=895653.3333333334, ans=0.0 2023-12-23 02:02:42,539 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.196e+01 3.316e+01 3.462e+01 4.110e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 02:02:42,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=895720.0, ans=0.2 2023-12-23 02:02:45,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2023-12-23 02:03:06,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-12-23 02:03:09,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=895853.3333333334, ans=0.0 2023-12-23 02:03:21,011 INFO [train.py:886] (3/4) Epoch 29, batch 950, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4905403.67 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:03:21,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895986.6666666666, ans=0.125 2023-12-23 02:03:31,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=896053.3333333334, ans=0.0 2023-12-23 02:04:00,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896186.6666666666, ans=0.1 2023-12-23 02:04:01,790 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:04:03,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=896253.3333333334, ans=0.125 2023-12-23 02:04:03,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=896253.3333333334, ans=0.0 2023-12-23 02:04:09,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=896253.3333333334, ans=0.0 2023-12-23 02:04:10,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-12-23 02:04:13,320 INFO [train.py:886] (3/4) Epoch 29, batch 1000, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4911892.36 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:04:13,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=896320.0, ans=0.125 2023-12-23 02:04:24,550 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.234e+01 3.376e+01 3.564e+01 4.018e+01, threshold=6.752e+01, percent-clipped=0.0 2023-12-23 02:04:26,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-23 02:04:42,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=896520.0, ans=0.125 2023-12-23 02:04:49,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-12-23 02:04:56,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-12-23 02:04:59,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=896586.6666666666, ans=0.125 2023-12-23 02:05:03,425 INFO [train.py:886] (3/4) Epoch 29, batch 1050, loss[loss=0.01073, audio_tagging_loss=0.01073, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4915857.62 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:05:06,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=896653.3333333334, ans=0.125 2023-12-23 02:05:20,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=15.0 2023-12-23 02:05:30,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=896786.6666666666, ans=0.2 2023-12-23 02:05:33,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896853.3333333334, ans=0.1 2023-12-23 02:05:39,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=896853.3333333334, ans=0.0 2023-12-23 02:05:44,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896920.0, ans=0.1 2023-12-23 02:05:54,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=896986.6666666666, ans=0.125 2023-12-23 02:05:55,166 INFO [train.py:886] (3/4) Epoch 29, batch 1100, loss[loss=0.01313, audio_tagging_loss=0.01313, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4923545.88 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:06:07,973 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.193e+01 3.319e+01 3.484e+01 4.077e+01, threshold=6.637e+01, percent-clipped=0.0 2023-12-23 02:06:08,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=897053.3333333334, ans=0.0 2023-12-23 02:06:18,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=897120.0, ans=0.0 2023-12-23 02:06:24,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.16 vs. limit=22.5 2023-12-23 02:06:29,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=897186.6666666666, ans=0.0 2023-12-23 02:06:31,798 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.577e-03 2023-12-23 02:06:32,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=897186.6666666666, ans=0.0 2023-12-23 02:06:32,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=897186.6666666666, ans=0.0 2023-12-23 02:06:33,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=897186.6666666666, ans=0.035 2023-12-23 02:06:40,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=897253.3333333334, ans=0.125 2023-12-23 02:06:41,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=897253.3333333334, ans=0.125 2023-12-23 02:06:42,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=897253.3333333334, ans=0.125 2023-12-23 02:06:42,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=897253.3333333334, ans=0.125 2023-12-23 02:06:46,464 INFO [train.py:886] (3/4) Epoch 29, batch 1150, loss[loss=0.01202, audio_tagging_loss=0.01202, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4926793.72 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:06:48,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=897320.0, ans=0.2 2023-12-23 02:06:48,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=897320.0, ans=0.125 2023-12-23 02:07:38,486 INFO [train.py:886] (3/4) Epoch 29, batch 1200, loss[loss=0.01034, audio_tagging_loss=0.01034, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4926809.83 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:07:38,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=897653.3333333334, ans=0.125 2023-12-23 02:07:50,594 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.246e+01 3.372e+01 3.513e+01 4.009e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 02:07:51,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=897720.0, ans=0.125 2023-12-23 02:08:05,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-12-23 02:08:18,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=15.0 2023-12-23 02:08:20,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-12-23 02:08:29,790 INFO [train.py:886] (3/4) Epoch 29, batch 1250, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4927705.28 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:08:40,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898053.3333333334, ans=0.1 2023-12-23 02:08:48,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=898053.3333333334, ans=0.09899494936611666 2023-12-23 02:08:52,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-12-23 02:09:07,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2023-12-23 02:09:20,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-12-23 02:09:21,418 INFO [train.py:886] (3/4) Epoch 29, batch 1300, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4926331.84 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:09:28,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=898320.0, ans=0.125 2023-12-23 02:09:33,508 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.906e+01 3.213e+01 3.404e+01 3.516e+01 4.030e+01, threshold=6.807e+01, percent-clipped=0.0 2023-12-23 02:10:07,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=898586.6666666666, ans=0.0 2023-12-23 02:10:12,326 INFO [train.py:886] (3/4) Epoch 29, batch 1350, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4935802.53 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 32.0 2023-12-23 02:10:15,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2023-12-23 02:10:37,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.72 vs. limit=15.0 2023-12-23 02:10:39,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=898786.6666666666, ans=0.125 2023-12-23 02:11:00,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=898920.0, ans=0.0 2023-12-23 02:11:03,476 INFO [train.py:886] (3/4) Epoch 29, batch 1400, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4940964.84 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:11:14,757 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.846e+01 3.186e+01 3.286e+01 3.462e+01 3.963e+01, threshold=6.572e+01, percent-clipped=0.0 2023-12-23 02:11:36,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=899186.6666666666, ans=0.125 2023-12-23 02:11:53,857 INFO [train.py:886] (3/4) Epoch 29, batch 1450, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4935903.05 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:11,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=899386.6666666666, ans=0.0 2023-12-23 02:12:11,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2023-12-23 02:12:16,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899453.3333333334, ans=0.125 2023-12-23 02:12:20,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-12-23 02:12:21,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=899453.3333333334, ans=0.05 2023-12-23 02:12:26,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899520.0, ans=0.1 2023-12-23 02:12:44,696 INFO [train.py:886] (3/4) Epoch 29, batch 1500, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4945270.55 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:12:51,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899653.3333333334, ans=0.0 2023-12-23 02:12:56,601 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.233e+01 3.348e+01 3.462e+01 4.143e+01, threshold=6.696e+01, percent-clipped=0.0 2023-12-23 02:13:05,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=899786.6666666666, ans=0.2 2023-12-23 02:13:23,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=899920.0, ans=0.0 2023-12-23 02:13:25,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=899920.0, ans=0.125 2023-12-23 02:13:36,101 INFO [train.py:886] (3/4) Epoch 29, batch 1550, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4948002.01 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:13:43,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=899986.6666666666, ans=0.0 2023-12-23 02:13:57,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900120.0, ans=0.0 2023-12-23 02:14:07,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=900186.6666666666, ans=0.125 2023-12-23 02:14:27,189 INFO [train.py:886] (3/4) Epoch 29, batch 1600, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4945494.56 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:14:27,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=900320.0, ans=0.125 2023-12-23 02:14:40,713 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.277e+01 3.394e+01 3.581e+01 4.487e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:14:41,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900386.6666666666, ans=0.125 2023-12-23 02:14:46,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=900386.6666666666, ans=0.07 2023-12-23 02:14:50,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2023-12-23 02:14:54,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=900453.3333333334, ans=0.125 2023-12-23 02:14:54,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=900453.3333333334, ans=0.125 2023-12-23 02:14:56,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=900453.3333333334, ans=0.0 2023-12-23 02:15:00,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=900520.0, ans=0.125 2023-12-23 02:15:02,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900520.0, ans=0.1 2023-12-23 02:15:19,560 INFO [train.py:886] (3/4) Epoch 29, batch 1650, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4944757.84 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:15:42,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=900786.6666666666, ans=0.125 2023-12-23 02:15:44,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900786.6666666666, ans=0.0 2023-12-23 02:16:10,146 INFO [train.py:886] (3/4) Epoch 29, batch 1700, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24057.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4945734.10 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:16:21,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=901053.3333333334, ans=0.0 2023-12-23 02:16:22,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.200e+01 3.336e+01 3.521e+01 4.401e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:16:28,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=901053.3333333334, ans=0.0 2023-12-23 02:16:52,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=901253.3333333334, ans=0.2 2023-12-23 02:16:54,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=901253.3333333334, ans=0.125 2023-12-23 02:16:57,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=901253.3333333334, ans=0.0 2023-12-23 02:17:01,612 INFO [train.py:886] (3/4) Epoch 29, batch 1750, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4952100.55 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:17:07,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=901320.0, ans=0.125 2023-12-23 02:17:23,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=901453.3333333334, ans=0.2 2023-12-23 02:17:32,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=901520.0, ans=0.125 2023-12-23 02:17:43,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=901586.6666666666, ans=0.0 2023-12-23 02:17:46,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=901586.6666666666, ans=0.2 2023-12-23 02:17:53,491 INFO [train.py:886] (3/4) Epoch 29, batch 1800, loss[loss=0.01379, audio_tagging_loss=0.01379, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4952974.06 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:17:55,665 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:18:05,610 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.853e+01 3.204e+01 3.323e+01 3.491e+01 3.903e+01, threshold=6.647e+01, percent-clipped=0.0 2023-12-23 02:18:09,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=901720.0, ans=0.02 2023-12-23 02:18:21,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901786.6666666666, ans=0.1 2023-12-23 02:18:25,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901853.3333333334, ans=0.0 2023-12-23 02:18:26,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.86 vs. limit=22.5 2023-12-23 02:18:29,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=901853.3333333334, ans=0.125 2023-12-23 02:18:33,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901853.3333333334, ans=0.125 2023-12-23 02:18:41,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2023-12-23 02:18:44,423 INFO [train.py:886] (3/4) Epoch 29, batch 1850, loss[loss=0.01699, audio_tagging_loss=0.01699, over 24750.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4949648.81 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:19:06,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-12-23 02:19:08,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2023-12-23 02:19:13,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=902120.0, ans=0.2 2023-12-23 02:19:15,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902186.6666666666, ans=0.1 2023-12-23 02:19:37,444 INFO [train.py:886] (3/4) Epoch 29, batch 1900, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4946708.15 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:19:41,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=902320.0, ans=0.125 2023-12-23 02:19:41,375 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:19:48,703 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.307e+01 3.435e+01 3.560e+01 3.951e+01, threshold=6.871e+01, percent-clipped=0.0 2023-12-23 02:19:58,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=902453.3333333334, ans=0.125 2023-12-23 02:19:59,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=902453.3333333334, ans=0.0 2023-12-23 02:20:11,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=902520.0, ans=0.125 2023-12-23 02:20:16,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=902520.0, ans=0.125 2023-12-23 02:20:28,813 INFO [train.py:886] (3/4) Epoch 29, batch 1950, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4945720.71 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2023-12-23 02:20:51,317 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:21:02,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-23 02:21:17,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-12-23 02:21:19,634 INFO [train.py:886] (3/4) Epoch 29, batch 2000, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4943767.83 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:21:31,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=903053.3333333334, ans=0.125 2023-12-23 02:21:32,358 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.186e+01 3.326e+01 3.515e+01 4.262e+01, threshold=6.651e+01, percent-clipped=0.0 2023-12-23 02:21:43,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=903120.0, ans=0.2 2023-12-23 02:21:52,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=903186.6666666666, ans=0.125 2023-12-23 02:21:57,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2023-12-23 02:22:02,825 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:22:05,552 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:22:10,987 INFO [train.py:886] (3/4) Epoch 29, batch 2050, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4950167.51 frames. ], batch size: 99, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:22:11,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2023-12-23 02:22:17,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=903320.0, ans=0.0 2023-12-23 02:22:18,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2023-12-23 02:22:23,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-12-23 02:22:33,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=903453.3333333334, ans=0.0 2023-12-23 02:22:38,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903453.3333333334, ans=0.1 2023-12-23 02:22:40,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903453.3333333334, ans=0.1 2023-12-23 02:22:45,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=903520.0, ans=0.125 2023-12-23 02:22:46,764 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:22:59,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=903586.6666666666, ans=0.0 2023-12-23 02:23:02,152 INFO [train.py:886] (3/4) Epoch 29, batch 2100, loss[loss=0.01428, audio_tagging_loss=0.01428, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4952741.70 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 64.0 2023-12-23 02:23:14,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=903720.0, ans=0.125 2023-12-23 02:23:14,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.861e+01 3.167e+01 3.386e+01 3.538e+01 3.863e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 02:23:26,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903786.6666666666, ans=0.1 2023-12-23 02:23:29,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903786.6666666666, ans=0.125 2023-12-23 02:23:36,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=903853.3333333334, ans=0.0 2023-12-23 02:23:37,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903853.3333333334, ans=0.1 2023-12-23 02:23:52,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=903986.6666666666, ans=0.0 2023-12-23 02:23:54,323 INFO [train.py:886] (3/4) Epoch 29, batch 2150, loss[loss=0.0148, audio_tagging_loss=0.0148, over 21171.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4949282.63 frames. ], batch size: 107, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:23:59,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=903986.6666666666, ans=0.125 2023-12-23 02:24:14,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=904120.0, ans=0.125 2023-12-23 02:24:21,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=904120.0, ans=0.125 2023-12-23 02:24:24,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=904186.6666666666, ans=0.2 2023-12-23 02:24:38,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=904253.3333333334, ans=0.0 2023-12-23 02:24:38,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-12-23 02:24:42,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-12-23 02:24:45,882 INFO [train.py:886] (3/4) Epoch 29, batch 2200, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4948977.00 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:24:46,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=904320.0, ans=0.2 2023-12-23 02:24:57,916 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.956e+01 3.244e+01 3.385e+01 3.592e+01 6.696e+01, threshold=6.770e+01, percent-clipped=0.0 2023-12-23 02:25:02,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=904386.6666666666, ans=0.0 2023-12-23 02:25:02,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=904386.6666666666, ans=0.125 2023-12-23 02:25:04,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=904386.6666666666, ans=0.125 2023-12-23 02:25:37,566 INFO [train.py:886] (3/4) Epoch 29, batch 2250, loss[loss=0.01486, audio_tagging_loss=0.01486, over 24941.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4949498.35 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:25:49,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-12-23 02:25:50,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904720.0, ans=0.125 2023-12-23 02:25:57,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=904720.0, ans=0.125 2023-12-23 02:26:15,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=904853.3333333334, ans=0.0 2023-12-23 02:26:24,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904920.0, ans=0.1 2023-12-23 02:26:29,637 INFO [train.py:886] (3/4) Epoch 29, batch 2300, loss[loss=0.01438, audio_tagging_loss=0.01438, over 22696.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4948502.37 frames. ], batch size: 107, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:26:41,538 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.860e+01 3.170e+01 3.367e+01 3.537e+01 3.962e+01, threshold=6.735e+01, percent-clipped=0.0 2023-12-23 02:26:55,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905120.0, ans=0.125 2023-12-23 02:26:56,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=905120.0, ans=0.0 2023-12-23 02:26:57,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=905120.0, ans=0.0 2023-12-23 02:27:08,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-23 02:27:21,229 INFO [train.py:886] (3/4) Epoch 29, batch 2350, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4948478.24 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:27:32,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=905386.6666666666, ans=0.125 2023-12-23 02:27:45,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=905453.3333333334, ans=0.125 2023-12-23 02:27:56,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=905520.0, ans=0.125 2023-12-23 02:28:08,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=905586.6666666666, ans=0.125 2023-12-23 02:28:12,493 INFO [train.py:886] (3/4) Epoch 29, batch 2400, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4952439.24 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:28:13,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=905653.3333333334, ans=0.0 2023-12-23 02:28:18,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=905653.3333333334, ans=0.125 2023-12-23 02:28:19,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=905653.3333333334, ans=0.1 2023-12-23 02:28:25,399 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.824e+01 3.172e+01 3.352e+01 3.505e+01 4.147e+01, threshold=6.703e+01, percent-clipped=0.0 2023-12-23 02:28:43,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2023-12-23 02:28:46,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=905853.3333333334, ans=0.0 2023-12-23 02:28:58,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=905920.0, ans=0.0 2023-12-23 02:29:03,269 INFO [train.py:886] (3/4) Epoch 29, batch 2450, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4947967.56 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:29:16,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=906053.3333333334, ans=0.1 2023-12-23 02:29:21,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=906053.3333333334, ans=0.0 2023-12-23 02:29:25,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906120.0, ans=0.1 2023-12-23 02:29:31,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-12-23 02:29:54,560 INFO [train.py:886] (3/4) Epoch 29, batch 2500, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4943745.31 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:30:00,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-12-23 02:30:07,119 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.911e+01 3.261e+01 3.366e+01 3.535e+01 4.148e+01, threshold=6.733e+01, percent-clipped=0.0 2023-12-23 02:30:18,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=906453.3333333334, ans=0.125 2023-12-23 02:30:19,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=906453.3333333334, ans=0.2 2023-12-23 02:30:31,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=12.0 2023-12-23 02:30:32,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=906520.0, ans=0.125 2023-12-23 02:30:46,633 INFO [train.py:886] (3/4) Epoch 29, batch 2550, loss[loss=0.01185, audio_tagging_loss=0.01185, over 25000.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4946605.94 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:30:56,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=906653.3333333334, ans=0.09899494936611666 2023-12-23 02:31:04,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=906720.0, ans=0.0 2023-12-23 02:31:41,240 INFO [train.py:886] (3/4) Epoch 29, batch 2600, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 4946273.79 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:31:53,149 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.274e+01 3.410e+01 3.573e+01 4.224e+01, threshold=6.821e+01, percent-clipped=0.0 2023-12-23 02:32:06,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-12-23 02:32:32,187 INFO [train.py:886] (3/4) Epoch 29, batch 2650, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4949418.27 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:33:22,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=907586.6666666666, ans=0.125 2023-12-23 02:33:25,038 INFO [train.py:886] (3/4) Epoch 29, batch 2700, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4949205.34 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:33:30,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=907653.3333333334, ans=0.2 2023-12-23 02:33:32,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=907653.3333333334, ans=0.5 2023-12-23 02:33:36,308 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.874e+01 3.174e+01 3.334e+01 3.479e+01 4.050e+01, threshold=6.667e+01, percent-clipped=0.0 2023-12-23 02:34:06,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=907920.0, ans=0.125 2023-12-23 02:34:11,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=907920.0, ans=0.2 2023-12-23 02:34:13,303 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:34:14,945 INFO [train.py:886] (3/4) Epoch 29, batch 2750, loss[loss=0.0155, audio_tagging_loss=0.0155, over 25000.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4951120.97 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:34:26,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-12-23 02:34:27,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=908053.3333333334, ans=0.125 2023-12-23 02:34:40,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2023-12-23 02:34:44,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=908186.6666666666, ans=0.95 2023-12-23 02:34:56,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2023-12-23 02:35:06,882 INFO [train.py:886] (3/4) Epoch 29, batch 2800, loss[loss=0.01455, audio_tagging_loss=0.01455, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4956770.44 frames. ], batch size: 99, lr: 3.72e-03, grad_scale: 64.0 2023-12-23 02:35:09,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=908320.0, ans=0.025 2023-12-23 02:35:18,185 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.855e+01 3.185e+01 3.347e+01 3.494e+01 4.048e+01, threshold=6.695e+01, percent-clipped=0.0 2023-12-23 02:35:20,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2023-12-23 02:35:23,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-12-23 02:35:29,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=908453.3333333334, ans=0.125 2023-12-23 02:35:40,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=908520.0, ans=0.2 2023-12-23 02:35:58,182 INFO [train.py:886] (3/4) Epoch 29, batch 2850, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24080.00 frames. ], tot_loss[loss=0.01287, audio_tagging_loss=0.01287, over 4953044.29 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:35:59,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=908653.3333333334, ans=0.2 2023-12-23 02:36:04,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=908653.3333333334, ans=0.125 2023-12-23 02:36:07,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-23 02:36:11,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=908720.0, ans=0.2 2023-12-23 02:36:26,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=908786.6666666666, ans=0.2 2023-12-23 02:36:47,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=908920.0, ans=0.125 2023-12-23 02:36:48,906 INFO [train.py:886] (3/4) Epoch 29, batch 2900, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01281, audio_tagging_loss=0.01281, over 4951077.66 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:37:02,340 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.712e+01 3.255e+01 3.355e+01 3.547e+01 3.874e+01, threshold=6.710e+01, percent-clipped=0.0 2023-12-23 02:37:03,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=909053.3333333334, ans=0.0 2023-12-23 02:37:06,173 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:37:08,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-23 02:37:38,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-12-23 02:37:39,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=909253.3333333334, ans=0.0 2023-12-23 02:37:41,186 INFO [train.py:886] (3/4) Epoch 29, batch 2950, loss[loss=0.01158, audio_tagging_loss=0.01158, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4956582.69 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:37:50,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=909386.6666666666, ans=0.2 2023-12-23 02:38:08,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909453.3333333334, ans=0.1 2023-12-23 02:38:11,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=909520.0, ans=0.0 2023-12-23 02:38:14,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.58 vs. limit=12.0 2023-12-23 02:38:17,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=909520.0, ans=0.0 2023-12-23 02:38:32,179 INFO [train.py:886] (3/4) Epoch 29, batch 3000, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4956380.23 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:38:32,180 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 02:38:52,603 INFO [train.py:917] (3/4) Epoch 29, validation: loss=0.03351, audio_tagging_loss=0.03351, over 3737520.00 frames. 2023-12-23 02:38:52,604 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 02:39:02,743 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.038e-02 2023-12-23 02:39:06,001 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.194e+01 3.342e+01 3.489e+01 4.333e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 02:39:21,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-12-23 02:39:26,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.44 vs. limit=22.5 2023-12-23 02:39:43,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2023-12-23 02:39:45,468 INFO [train.py:886] (3/4) Epoch 29, batch 3050, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4961026.99 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:39:52,228 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:39:53,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=909986.6666666666, ans=0.125 2023-12-23 02:39:58,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=910053.3333333334, ans=0.125 2023-12-23 02:40:01,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=910053.3333333334, ans=0.125 2023-12-23 02:40:21,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2023-12-23 02:40:23,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2023-12-23 02:40:28,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=910253.3333333334, ans=0.0 2023-12-23 02:40:36,996 INFO [train.py:886] (3/4) Epoch 29, batch 3100, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4959858.82 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:40:48,918 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.900e+01 3.214e+01 3.353e+01 3.491e+01 4.472e+01, threshold=6.707e+01, percent-clipped=0.0 2023-12-23 02:40:51,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=910386.6666666666, ans=10.0 2023-12-23 02:40:58,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.46 vs. limit=22.5 2023-12-23 02:41:07,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=910520.0, ans=0.125 2023-12-23 02:41:10,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=910520.0, ans=0.5 2023-12-23 02:41:13,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=910520.0, ans=0.125 2023-12-23 02:41:27,509 INFO [train.py:886] (3/4) Epoch 29, batch 3150, loss[loss=0.01119, audio_tagging_loss=0.01119, over 22508.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4951869.67 frames. ], batch size: 107, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:41:37,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=910720.0, ans=0.5 2023-12-23 02:41:51,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-12-23 02:41:51,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=910786.6666666666, ans=0.0 2023-12-23 02:41:58,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=910853.3333333334, ans=0.0 2023-12-23 02:41:58,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=910853.3333333334, ans=0.125 2023-12-23 02:42:19,859 INFO [train.py:886] (3/4) Epoch 29, batch 3200, loss[loss=0.01381, audio_tagging_loss=0.01381, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4946563.89 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:42:31,881 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.238e+01 3.417e+01 3.574e+01 4.125e+01, threshold=6.834e+01, percent-clipped=0.0 2023-12-23 02:42:32,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=911053.3333333334, ans=0.04949747468305833 2023-12-23 02:42:52,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=911186.6666666666, ans=0.125 2023-12-23 02:43:11,986 INFO [train.py:886] (3/4) Epoch 29, batch 3250, loss[loss=0.01314, audio_tagging_loss=0.01314, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4940668.89 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:43:23,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.95 vs. limit=10.0 2023-12-23 02:43:31,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-12-23 02:43:32,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=911453.3333333334, ans=0.125 2023-12-23 02:43:33,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-12-23 02:43:41,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=911520.0, ans=0.0 2023-12-23 02:43:45,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=911520.0, ans=0.0 2023-12-23 02:43:48,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=911520.0, ans=0.0 2023-12-23 02:43:52,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911586.6666666666, ans=0.1 2023-12-23 02:44:02,795 INFO [train.py:886] (3/4) Epoch 29, batch 3300, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4941383.20 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:44:02,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=911653.3333333334, ans=0.125 2023-12-23 02:44:14,885 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.877e+01 3.159e+01 3.341e+01 3.490e+01 4.241e+01, threshold=6.682e+01, percent-clipped=0.0 2023-12-23 02:44:22,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=911786.6666666666, ans=22.5 2023-12-23 02:44:27,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=911786.6666666666, ans=0.125 2023-12-23 02:44:31,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=911786.6666666666, ans=0.1 2023-12-23 02:44:35,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=911853.3333333334, ans=0.0 2023-12-23 02:44:40,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=12.0 2023-12-23 02:44:40,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=911853.3333333334, ans=0.0 2023-12-23 02:44:44,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2023-12-23 02:44:53,556 INFO [train.py:886] (3/4) Epoch 29, batch 3350, loss[loss=0.01395, audio_tagging_loss=0.01395, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4945505.66 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:44:53,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=911986.6666666666, ans=0.2 2023-12-23 02:45:04,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-12-23 02:45:19,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=912120.0, ans=0.09899494936611666 2023-12-23 02:45:41,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=912253.3333333334, ans=0.0 2023-12-23 02:45:45,264 INFO [train.py:886] (3/4) Epoch 29, batch 3400, loss[loss=0.01445, audio_tagging_loss=0.01445, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4948204.47 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:45:48,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=912320.0, ans=0.125 2023-12-23 02:45:57,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=912386.6666666666, ans=0.125 2023-12-23 02:45:57,954 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.777e+01 3.196e+01 3.334e+01 3.498e+01 4.044e+01, threshold=6.669e+01, percent-clipped=0.0 2023-12-23 02:46:27,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-23 02:46:27,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=912586.6666666666, ans=0.0 2023-12-23 02:46:36,485 INFO [train.py:886] (3/4) Epoch 29, batch 3450, loss[loss=0.01239, audio_tagging_loss=0.01239, over 25000.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4941888.68 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:46:54,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=912720.0, ans=0.0 2023-12-23 02:46:55,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=912720.0, ans=0.0 2023-12-23 02:46:58,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=912786.6666666666, ans=0.2 2023-12-23 02:47:02,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=912786.6666666666, ans=0.1 2023-12-23 02:47:16,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=912920.0, ans=0.0 2023-12-23 02:47:18,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=912920.0, ans=0.0 2023-12-23 02:47:28,180 INFO [train.py:886] (3/4) Epoch 29, batch 3500, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4940218.28 frames. ], batch size: 99, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:47:32,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=912986.6666666666, ans=0.0 2023-12-23 02:47:39,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=913053.3333333334, ans=0.125 2023-12-23 02:47:40,257 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.822e+01 3.263e+01 3.374e+01 3.511e+01 3.785e+01, threshold=6.748e+01, percent-clipped=0.0 2023-12-23 02:47:58,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-23 02:47:58,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=913186.6666666666, ans=0.125 2023-12-23 02:48:06,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-12-23 02:48:18,278 INFO [train.py:886] (3/4) Epoch 29, batch 3550, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4938865.70 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 64.0 2023-12-23 02:48:35,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=913386.6666666666, ans=0.0 2023-12-23 02:48:42,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 02:48:48,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-12-23 02:48:48,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=913520.0, ans=0.0 2023-12-23 02:48:51,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=913520.0, ans=0.125 2023-12-23 02:49:08,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=913586.6666666666, ans=0.125 2023-12-23 02:49:11,424 INFO [train.py:886] (3/4) Epoch 29, batch 3600, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4939415.67 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:49:18,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=913653.3333333334, ans=0.1 2023-12-23 02:49:22,664 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.831e+01 3.175e+01 3.361e+01 3.491e+01 3.935e+01, threshold=6.721e+01, percent-clipped=0.0 2023-12-23 02:49:40,172 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:50:02,205 INFO [train.py:886] (3/4) Epoch 29, batch 3650, loss[loss=0.01121, audio_tagging_loss=0.01121, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4942752.75 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:50:02,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=913986.6666666666, ans=0.125 2023-12-23 02:50:13,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=914053.3333333334, ans=0.5 2023-12-23 02:50:23,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2023-12-23 02:50:54,398 INFO [train.py:886] (3/4) Epoch 29, batch 3700, loss[loss=0.009949, audio_tagging_loss=0.009949, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4945239.16 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:50:56,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=914320.0, ans=0.125 2023-12-23 02:50:57,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=914320.0, ans=0.125 2023-12-23 02:50:59,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-12-23 02:51:05,788 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.187e+01 3.336e+01 3.523e+01 3.935e+01, threshold=6.671e+01, percent-clipped=0.0 2023-12-23 02:51:05,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=914386.6666666666, ans=0.125 2023-12-23 02:51:25,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 02:51:29,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-12-23 02:51:31,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2023-12-23 02:51:39,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=914586.6666666666, ans=0.0 2023-12-23 02:51:44,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2023-12-23 02:51:45,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=914653.3333333334, ans=0.2 2023-12-23 02:51:46,766 INFO [train.py:886] (3/4) Epoch 29, batch 3750, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4947362.85 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:51:50,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=914653.3333333334, ans=0.2 2023-12-23 02:51:56,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=914720.0, ans=0.0 2023-12-23 02:52:04,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=914720.0, ans=0.0 2023-12-23 02:52:37,224 INFO [train.py:886] (3/4) Epoch 29, batch 3800, loss[loss=0.01492, audio_tagging_loss=0.01492, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4945710.43 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:52:39,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=914986.6666666666, ans=0.0 2023-12-23 02:52:50,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.921e+01 3.295e+01 3.394e+01 3.553e+01 4.650e+01, threshold=6.788e+01, percent-clipped=0.0 2023-12-23 02:52:55,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=915053.3333333334, ans=0.125 2023-12-23 02:53:02,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=915120.0, ans=0.0 2023-12-23 02:53:10,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=915186.6666666666, ans=0.95 2023-12-23 02:53:13,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=915186.6666666666, ans=0.125 2023-12-23 02:53:29,909 INFO [train.py:886] (3/4) Epoch 29, batch 3850, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4942606.14 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:53:33,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2023-12-23 02:53:41,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915386.6666666666, ans=0.1 2023-12-23 02:53:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=915386.6666666666, ans=0.0 2023-12-23 02:53:46,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=915386.6666666666, ans=0.125 2023-12-23 02:53:57,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=915453.3333333334, ans=0.1 2023-12-23 02:54:10,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=915586.6666666666, ans=0.125 2023-12-23 02:54:10,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=915586.6666666666, ans=0.125 2023-12-23 02:54:13,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=915586.6666666666, ans=0.125 2023-12-23 02:54:21,535 INFO [train.py:886] (3/4) Epoch 29, batch 3900, loss[loss=0.01011, audio_tagging_loss=0.01011, over 24023.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4946054.26 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:54:34,394 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+01 3.127e+01 3.290e+01 3.423e+01 3.887e+01, threshold=6.579e+01, percent-clipped=0.0 2023-12-23 02:54:55,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-23 02:55:00,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=915853.3333333334, ans=0.1 2023-12-23 02:55:13,358 INFO [train.py:886] (3/4) Epoch 29, batch 3950, loss[loss=0.0115, audio_tagging_loss=0.0115, over 23993.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4950966.56 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:55:18,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=915986.6666666666, ans=0.1 2023-12-23 02:55:20,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=915986.6666666666, ans=0.0 2023-12-23 02:55:33,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=916120.0, ans=0.0 2023-12-23 02:55:45,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=916186.6666666666, ans=0.0 2023-12-23 02:56:05,084 INFO [train.py:886] (3/4) Epoch 29, batch 4000, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4957932.58 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 128.0 2023-12-23 02:56:06,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=916320.0, ans=0.1 2023-12-23 02:56:06,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=916320.0, ans=0.125 2023-12-23 02:56:07,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=916320.0, ans=0.1 2023-12-23 02:56:17,055 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.298e+01 3.392e+01 3.535e+01 4.607e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 02:56:17,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-12-23 02:56:27,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=916453.3333333334, ans=0.2 2023-12-23 02:56:30,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=916453.3333333334, ans=0.0 2023-12-23 02:56:40,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=916520.0, ans=0.125 2023-12-23 02:56:48,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=916586.6666666666, ans=0.2 2023-12-23 02:56:54,936 INFO [train.py:886] (3/4) Epoch 29, batch 4050, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4960398.31 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:57:16,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=916786.6666666666, ans=0.035 2023-12-23 02:57:23,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=916786.6666666666, ans=0.0 2023-12-23 02:57:47,901 INFO [train.py:886] (3/4) Epoch 29, batch 4100, loss[loss=0.01582, audio_tagging_loss=0.01582, over 21901.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4947134.07 frames. ], batch size: 107, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:57:50,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=916986.6666666666, ans=0.125 2023-12-23 02:57:54,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=916986.6666666666, ans=0.125 2023-12-23 02:57:55,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=916986.6666666666, ans=0.125 2023-12-23 02:57:57,698 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 02:57:59,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=917053.3333333334, ans=0.015 2023-12-23 02:58:00,224 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.035e+01 3.272e+01 3.403e+01 3.607e+01 4.047e+01, threshold=6.806e+01, percent-clipped=0.0 2023-12-23 02:58:07,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=917120.0, ans=0.05 2023-12-23 02:58:19,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=917186.6666666666, ans=0.2 2023-12-23 02:58:22,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=917186.6666666666, ans=0.04949747468305833 2023-12-23 02:58:29,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=917253.3333333334, ans=0.0 2023-12-23 02:58:38,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=917320.0, ans=0.0 2023-12-23 02:58:39,289 INFO [train.py:886] (3/4) Epoch 29, batch 4150, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4944482.35 frames. ], batch size: 99, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:59:08,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=917453.3333333334, ans=0.07 2023-12-23 02:59:10,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-23 02:59:30,202 INFO [train.py:886] (3/4) Epoch 29, batch 4200, loss[loss=0.01454, audio_tagging_loss=0.01454, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4948986.07 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 02:59:38,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=917653.3333333334, ans=0.125 2023-12-23 02:59:42,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=917720.0, ans=0.2 2023-12-23 02:59:43,288 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.179e+01 3.334e+01 3.481e+01 3.882e+01, threshold=6.668e+01, percent-clipped=0.0 2023-12-23 02:59:45,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=917720.0, ans=0.0 2023-12-23 02:59:50,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=917786.6666666666, ans=0.125 2023-12-23 03:00:06,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=917853.3333333334, ans=0.0 2023-12-23 03:00:12,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=917920.0, ans=0.125 2023-12-23 03:00:21,201 INFO [train.py:886] (3/4) Epoch 29, batch 4250, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4949170.35 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:00:24,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=917986.6666666666, ans=0.125 2023-12-23 03:00:43,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=918120.0, ans=0.125 2023-12-23 03:00:56,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=918186.6666666666, ans=0.1 2023-12-23 03:00:58,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=918186.6666666666, ans=0.1 2023-12-23 03:01:10,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=918253.3333333334, ans=0.125 2023-12-23 03:01:11,959 INFO [train.py:886] (3/4) Epoch 29, batch 4300, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4952168.37 frames. ], batch size: 100, lr: 3.70e-03, grad_scale: 64.0 2023-12-23 03:01:23,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=918386.6666666666, ans=0.2 2023-12-23 03:01:25,827 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.896e+01 3.260e+01 3.361e+01 3.479e+01 4.825e+01, threshold=6.722e+01, percent-clipped=0.0 2023-12-23 03:01:44,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=918520.0, ans=0.0 2023-12-23 03:01:51,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=918520.0, ans=0.125 2023-12-23 03:01:51,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=22.5 2023-12-23 03:01:56,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-12-23 03:02:02,821 INFO [train.py:886] (3/4) Epoch 29, batch 4350, loss[loss=0.01315, audio_tagging_loss=0.01315, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4951065.37 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:02:06,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=918653.3333333334, ans=0.0 2023-12-23 03:02:12,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.33 vs. limit=22.5 2023-12-23 03:02:19,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=918720.0, ans=0.125 2023-12-23 03:02:28,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=918786.6666666666, ans=0.125 2023-12-23 03:02:48,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=918920.0, ans=0.1 2023-12-23 03:02:51,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=918920.0, ans=0.125 2023-12-23 03:02:54,084 INFO [train.py:886] (3/4) Epoch 29, batch 4400, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4949655.77 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:03:05,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-12-23 03:03:06,951 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.260e+01 3.393e+01 3.638e+01 4.099e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 03:03:12,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-23 03:03:38,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-12-23 03:03:45,708 INFO [train.py:886] (3/4) Epoch 29, batch 4450, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01271, audio_tagging_loss=0.01271, over 4942095.30 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:03:51,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=919320.0, ans=0.0 2023-12-23 03:03:53,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=919320.0, ans=0.025 2023-12-23 03:04:08,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-12-23 03:04:15,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=919453.3333333334, ans=0.0 2023-12-23 03:04:38,027 INFO [train.py:886] (3/4) Epoch 29, batch 4500, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4940901.41 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:04:39,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=919653.3333333334, ans=0.0 2023-12-23 03:04:51,064 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.905e+01 3.193e+01 3.342e+01 3.524e+01 4.106e+01, threshold=6.683e+01, percent-clipped=0.0 2023-12-23 03:05:12,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=919853.3333333334, ans=0.125 2023-12-23 03:05:20,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2023-12-23 03:05:29,608 INFO [train.py:886] (3/4) Epoch 29, batch 4550, loss[loss=0.01273, audio_tagging_loss=0.01273, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4944899.14 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:05:37,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-12-23 03:06:08,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.79 vs. limit=10.0 2023-12-23 03:06:09,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=920186.6666666666, ans=0.0 2023-12-23 03:06:21,700 INFO [train.py:886] (3/4) Epoch 29, batch 4600, loss[loss=0.009891, audio_tagging_loss=0.009891, over 24029.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4950784.87 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:06:35,366 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.235e+01 3.344e+01 3.450e+01 3.985e+01, threshold=6.687e+01, percent-clipped=0.0 2023-12-23 03:06:43,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.76 vs. limit=22.5 2023-12-23 03:07:11,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=920586.6666666666, ans=0.1 2023-12-23 03:07:13,863 INFO [train.py:886] (3/4) Epoch 29, batch 4650, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4953003.07 frames. ], batch size: 100, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:07:20,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=920653.3333333334, ans=0.025 2023-12-23 03:07:23,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=920720.0, ans=0.125 2023-12-23 03:07:25,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=920720.0, ans=0.5 2023-12-23 03:07:26,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-12-23 03:07:43,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=920786.6666666666, ans=0.125 2023-12-23 03:07:46,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=920853.3333333334, ans=0.125 2023-12-23 03:07:52,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=920853.3333333334, ans=0.125 2023-12-23 03:07:53,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=920853.3333333334, ans=0.1 2023-12-23 03:08:04,169 INFO [train.py:886] (3/4) Epoch 29, batch 4700, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24750.00 frames. ], tot_loss[loss=0.01285, audio_tagging_loss=0.01285, over 4952347.49 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:08:07,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=920986.6666666666, ans=0.0 2023-12-23 03:08:12,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=921053.3333333334, ans=0.0 2023-12-23 03:08:15,989 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.319e+01 3.435e+01 3.584e+01 4.089e+01, threshold=6.869e+01, percent-clipped=0.0 2023-12-23 03:08:25,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=921120.0, ans=0.125 2023-12-23 03:08:43,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=921253.3333333334, ans=0.0 2023-12-23 03:08:46,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=921253.3333333334, ans=0.1 2023-12-23 03:08:51,954 INFO [train.py:886] (3/4) Epoch 29, batch 4750, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01294, audio_tagging_loss=0.01294, over 4948236.71 frames. ], batch size: 99, lr: 3.69e-03, grad_scale: 64.0 2023-12-23 03:09:01,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=921386.6666666666, ans=0.125 2023-12-23 03:09:26,349 INFO [train.py:886] (3/4) Epoch 30, batch 0, loss[loss=0.02799, audio_tagging_loss=0.02799, over 25000.00 frames. ], tot_loss[loss=0.02799, audio_tagging_loss=0.02799, over 25000.00 frames. ], batch size: 100, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:09:26,349 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 03:09:33,892 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7787, 5.9302, 5.2656, 5.7162], device='cuda:3') 2023-12-23 03:09:47,398 INFO [train.py:917] (3/4) Epoch 30, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 03:09:47,399 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 03:09:49,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.89 vs. limit=22.5 2023-12-23 03:10:03,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=921493.3333333334, ans=0.2 2023-12-23 03:10:21,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=921626.6666666666, ans=0.125 2023-12-23 03:10:28,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-12-23 03:10:35,649 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.394e+01 3.725e+01 4.735e+01 9.451e+01, threshold=7.450e+01, percent-clipped=7.0 2023-12-23 03:10:37,596 INFO [train.py:886] (3/4) Epoch 30, batch 50, loss[loss=0.01838, audio_tagging_loss=0.01838, over 25000.00 frames. ], tot_loss[loss=0.01994, audio_tagging_loss=0.01994, over 1116672.34 frames. ], batch size: 100, lr: 3.63e-03, grad_scale: 32.0 2023-12-23 03:10:38,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2023-12-23 03:10:42,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-12-23 03:11:03,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=921893.3333333334, ans=0.0 2023-12-23 03:11:03,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=921893.3333333334, ans=0.1 2023-12-23 03:11:20,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=922026.6666666666, ans=0.125 2023-12-23 03:11:30,719 INFO [train.py:886] (3/4) Epoch 30, batch 100, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01722, audio_tagging_loss=0.01722, over 1966795.60 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:11:34,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=922093.3333333334, ans=0.2 2023-12-23 03:11:41,409 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:11:41,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=922160.0, ans=0.0 2023-12-23 03:12:10,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-23 03:12:16,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=922360.0, ans=0.2 2023-12-23 03:12:18,427 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.004e+01 3.542e+01 3.730e+01 3.937e+01 4.567e+01, threshold=7.459e+01, percent-clipped=0.0 2023-12-23 03:12:20,991 INFO [train.py:886] (3/4) Epoch 30, batch 150, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.0158, audio_tagging_loss=0.0158, over 2629147.33 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:12:26,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2023-12-23 03:12:31,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=922493.3333333334, ans=0.05 2023-12-23 03:12:45,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=922560.0, ans=0.125 2023-12-23 03:12:53,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=922626.6666666666, ans=0.125 2023-12-23 03:13:05,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=922693.3333333334, ans=0.125 2023-12-23 03:13:09,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=922693.3333333334, ans=0.0 2023-12-23 03:13:10,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=922693.3333333334, ans=0.2 2023-12-23 03:13:12,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=922760.0, ans=0.0 2023-12-23 03:13:13,250 INFO [train.py:886] (3/4) Epoch 30, batch 200, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.0149, audio_tagging_loss=0.0149, over 3146485.89 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:13:14,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=922760.0, ans=0.07 2023-12-23 03:13:19,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=922760.0, ans=0.2 2023-12-23 03:13:21,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-12-23 03:13:28,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=922826.6666666666, ans=0.125 2023-12-23 03:14:02,672 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.258e+01 3.386e+01 3.497e+01 4.082e+01, threshold=6.772e+01, percent-clipped=0.0 2023-12-23 03:14:05,237 INFO [train.py:886] (3/4) Epoch 30, batch 250, loss[loss=0.01387, audio_tagging_loss=0.01387, over 25000.00 frames. ], tot_loss[loss=0.01427, audio_tagging_loss=0.01427, over 3550060.67 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:14:06,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=923093.3333333334, ans=0.125 2023-12-23 03:14:08,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-12-23 03:14:12,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=15.0 2023-12-23 03:14:21,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-23 03:14:29,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923226.6666666666, ans=0.1 2023-12-23 03:14:31,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=923226.6666666666, ans=0.0 2023-12-23 03:14:36,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-12-23 03:14:44,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=923293.3333333334, ans=0.125 2023-12-23 03:14:52,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=923360.0, ans=0.5 2023-12-23 03:14:56,017 INFO [train.py:886] (3/4) Epoch 30, batch 300, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24033.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 3858225.94 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:15:09,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=923493.3333333334, ans=0.0 2023-12-23 03:15:09,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=923493.3333333334, ans=0.0 2023-12-23 03:15:12,223 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:15:13,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=923493.3333333334, ans=0.1 2023-12-23 03:15:22,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=923560.0, ans=0.0 2023-12-23 03:15:46,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=923693.3333333334, ans=0.0 2023-12-23 03:15:46,800 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.964e+01 3.195e+01 3.394e+01 3.540e+01 4.201e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 03:15:48,768 INFO [train.py:886] (3/4) Epoch 30, batch 350, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4101046.67 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:15:48,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=923760.0, ans=0.1 2023-12-23 03:15:58,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.29 vs. limit=10.0 2023-12-23 03:16:04,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=923826.6666666666, ans=0.125 2023-12-23 03:16:10,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=923893.3333333334, ans=0.125 2023-12-23 03:16:20,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=923960.0, ans=0.125 2023-12-23 03:16:25,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=923960.0, ans=0.0 2023-12-23 03:16:32,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=924026.6666666666, ans=0.0 2023-12-23 03:16:39,592 INFO [train.py:886] (3/4) Epoch 30, batch 400, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4284400.45 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:16:57,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924160.0, ans=0.0 2023-12-23 03:17:04,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=924226.6666666666, ans=0.0 2023-12-23 03:17:05,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2023-12-23 03:17:16,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=924293.3333333334, ans=0.0 2023-12-23 03:17:23,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=924360.0, ans=0.125 2023-12-23 03:17:30,131 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.820e+01 3.158e+01 3.345e+01 3.496e+01 3.973e+01, threshold=6.691e+01, percent-clipped=0.0 2023-12-23 03:17:32,071 INFO [train.py:886] (3/4) Epoch 30, batch 450, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01313, audio_tagging_loss=0.01313, over 4430165.68 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:17:47,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=924493.3333333334, ans=0.0 2023-12-23 03:17:50,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=924493.3333333334, ans=0.0 2023-12-23 03:17:51,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=924493.3333333334, ans=0.125 2023-12-23 03:18:00,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=924560.0, ans=0.125 2023-12-23 03:18:16,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=924693.3333333334, ans=0.0 2023-12-23 03:18:23,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.71 vs. limit=22.5 2023-12-23 03:18:24,804 INFO [train.py:886] (3/4) Epoch 30, batch 500, loss[loss=0.01477, audio_tagging_loss=0.01477, over 25000.00 frames. ], tot_loss[loss=0.013, audio_tagging_loss=0.013, over 4545473.38 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:18:33,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=924760.0, ans=0.04949747468305833 2023-12-23 03:18:45,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=924893.3333333334, ans=0.0 2023-12-23 03:19:06,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=925026.6666666666, ans=0.125 2023-12-23 03:19:09,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=925026.6666666666, ans=0.1 2023-12-23 03:19:13,977 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.840e+01 3.274e+01 3.359e+01 3.528e+01 4.075e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 03:19:15,862 INFO [train.py:886] (3/4) Epoch 30, batch 550, loss[loss=0.009628, audio_tagging_loss=0.009628, over 23999.00 frames. ], tot_loss[loss=0.01296, audio_tagging_loss=0.01296, over 4634297.07 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:19:26,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=925160.0, ans=0.125 2023-12-23 03:19:42,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=925226.6666666666, ans=0.2 2023-12-23 03:19:57,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=925360.0, ans=0.125 2023-12-23 03:20:08,025 INFO [train.py:886] (3/4) Epoch 30, batch 600, loss[loss=0.01591, audio_tagging_loss=0.01591, over 24944.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 4705387.17 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:20:08,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=925426.6666666666, ans=0.125 2023-12-23 03:20:17,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925493.3333333334, ans=0.1 2023-12-23 03:20:23,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=925493.3333333334, ans=0.0 2023-12-23 03:20:56,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=925693.3333333334, ans=0.125 2023-12-23 03:20:57,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.230e+01 3.382e+01 3.569e+01 4.053e+01, threshold=6.765e+01, percent-clipped=0.0 2023-12-23 03:20:59,392 INFO [train.py:886] (3/4) Epoch 30, batch 650, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4752218.14 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:21:17,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-23 03:21:18,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=925826.6666666666, ans=0.125 2023-12-23 03:21:20,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2023-12-23 03:21:26,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-12-23 03:21:28,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-12-23 03:21:40,665 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:21:50,665 INFO [train.py:886] (3/4) Epoch 30, batch 700, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01288, audio_tagging_loss=0.01288, over 4792569.43 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:22:04,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=926160.0, ans=0.125 2023-12-23 03:22:14,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=926226.6666666666, ans=0.0 2023-12-23 03:22:19,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=926226.6666666666, ans=0.125 2023-12-23 03:22:31,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=926360.0, ans=0.125 2023-12-23 03:22:32,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.05 vs. limit=22.5 2023-12-23 03:22:37,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=926360.0, ans=0.125 2023-12-23 03:22:41,096 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.229e+01 3.388e+01 3.605e+01 3.882e+01, threshold=6.777e+01, percent-clipped=0.0 2023-12-23 03:22:43,039 INFO [train.py:886] (3/4) Epoch 30, batch 750, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4826322.06 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:22:50,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926426.6666666666, ans=0.1 2023-12-23 03:22:51,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=926426.6666666666, ans=0.125 2023-12-23 03:22:56,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=926493.3333333334, ans=0.125 2023-12-23 03:22:59,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=926493.3333333334, ans=0.0 2023-12-23 03:22:59,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=926493.3333333334, ans=0.125 2023-12-23 03:23:00,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-23 03:23:04,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=926560.0, ans=0.2 2023-12-23 03:23:30,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=12.0 2023-12-23 03:23:34,684 INFO [train.py:886] (3/4) Epoch 30, batch 800, loss[loss=0.0141, audio_tagging_loss=0.0141, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4856702.08 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2023-12-23 03:23:39,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-12-23 03:24:00,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-12-23 03:24:24,602 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.909e+01 3.234e+01 3.358e+01 3.542e+01 4.031e+01, threshold=6.717e+01, percent-clipped=0.0 2023-12-23 03:24:26,551 INFO [train.py:886] (3/4) Epoch 30, batch 850, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4885496.81 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:24:39,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=927160.0, ans=0.5 2023-12-23 03:24:42,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.26 vs. limit=10.0 2023-12-23 03:24:52,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=927226.6666666666, ans=0.0 2023-12-23 03:24:54,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927226.6666666666, ans=0.1 2023-12-23 03:25:18,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=927426.6666666666, ans=0.05 2023-12-23 03:25:19,746 INFO [train.py:886] (3/4) Epoch 30, batch 900, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4901992.21 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:25:20,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=927426.6666666666, ans=0.04949747468305833 2023-12-23 03:25:21,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=927426.6666666666, ans=0.0 2023-12-23 03:25:23,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-12-23 03:25:34,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=927493.3333333334, ans=0.0 2023-12-23 03:25:37,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=927493.3333333334, ans=0.09899494936611666 2023-12-23 03:25:44,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=927560.0, ans=0.0 2023-12-23 03:25:52,141 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:26:07,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=927693.3333333334, ans=0.1 2023-12-23 03:26:07,767 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.263e+01 3.411e+01 3.563e+01 4.036e+01, threshold=6.823e+01, percent-clipped=0.0 2023-12-23 03:26:08,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=927760.0, ans=0.125 2023-12-23 03:26:10,366 INFO [train.py:886] (3/4) Epoch 30, batch 950, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4912306.19 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:26:13,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=927760.0, ans=0.125 2023-12-23 03:26:16,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=927760.0, ans=0.125 2023-12-23 03:26:31,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=927893.3333333334, ans=10.0 2023-12-23 03:26:49,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-12-23 03:26:54,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=928026.6666666666, ans=0.2 2023-12-23 03:27:02,691 INFO [train.py:886] (3/4) Epoch 30, batch 1000, loss[loss=0.01178, audio_tagging_loss=0.01178, over 24750.00 frames. ], tot_loss[loss=0.01276, audio_tagging_loss=0.01276, over 4916370.30 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:27:06,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2023-12-23 03:27:10,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=928093.3333333334, ans=0.0 2023-12-23 03:27:15,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=928160.0, ans=0.0 2023-12-23 03:27:19,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=928160.0, ans=0.125 2023-12-23 03:27:37,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=928293.3333333334, ans=0.1 2023-12-23 03:27:37,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2023-12-23 03:27:52,138 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.864e+01 3.168e+01 3.324e+01 3.493e+01 4.167e+01, threshold=6.648e+01, percent-clipped=0.0 2023-12-23 03:27:54,028 INFO [train.py:886] (3/4) Epoch 30, batch 1050, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4927291.06 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:10,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-23 03:28:14,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=928560.0, ans=0.0 2023-12-23 03:28:44,577 INFO [train.py:886] (3/4) Epoch 30, batch 1100, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4934005.09 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:28:45,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=928760.0, ans=0.05 2023-12-23 03:29:06,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=928893.3333333334, ans=0.0 2023-12-23 03:29:12,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=928893.3333333334, ans=0.0 2023-12-23 03:29:16,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=928960.0, ans=0.2 2023-12-23 03:29:21,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=928960.0, ans=0.0 2023-12-23 03:29:34,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=929026.6666666666, ans=0.2 2023-12-23 03:29:35,375 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.223e+01 3.331e+01 3.541e+01 4.042e+01, threshold=6.662e+01, percent-clipped=0.0 2023-12-23 03:29:37,296 INFO [train.py:886] (3/4) Epoch 30, batch 1150, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4930725.75 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:29:41,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=929093.3333333334, ans=0.0 2023-12-23 03:29:49,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=929160.0, ans=0.125 2023-12-23 03:29:54,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=929160.0, ans=0.025 2023-12-23 03:30:27,922 INFO [train.py:886] (3/4) Epoch 30, batch 1200, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4940812.59 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:30:39,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2023-12-23 03:30:55,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=929560.0, ans=0.035 2023-12-23 03:31:18,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.244e+01 3.373e+01 3.542e+01 4.199e+01, threshold=6.745e+01, percent-clipped=0.0 2023-12-23 03:31:20,509 INFO [train.py:886] (3/4) Epoch 30, batch 1250, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01274, audio_tagging_loss=0.01274, over 4938099.20 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:31:24,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=929760.0, ans=0.0 2023-12-23 03:31:24,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=929760.0, ans=0.125 2023-12-23 03:31:26,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2023-12-23 03:31:50,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929893.3333333334, ans=0.1 2023-12-23 03:31:52,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=929960.0, ans=0.125 2023-12-23 03:31:52,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=12.0 2023-12-23 03:32:12,533 INFO [train.py:886] (3/4) Epoch 30, batch 1300, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 4935961.95 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:32:31,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=930226.6666666666, ans=0.125 2023-12-23 03:32:44,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=930293.3333333334, ans=0.07 2023-12-23 03:33:00,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=930360.0, ans=0.05 2023-12-23 03:33:00,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-12-23 03:33:01,544 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.974e+01 3.237e+01 3.393e+01 3.536e+01 4.080e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 03:33:03,484 INFO [train.py:886] (3/4) Epoch 30, batch 1350, loss[loss=0.01543, audio_tagging_loss=0.01543, over 24918.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4935612.02 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:33:09,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=930426.6666666666, ans=0.125 2023-12-23 03:33:16,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=930493.3333333334, ans=0.0 2023-12-23 03:33:30,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=930560.0, ans=0.125 2023-12-23 03:33:30,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-12-23 03:33:40,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2023-12-23 03:33:44,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=930693.3333333334, ans=0.1 2023-12-23 03:33:55,841 INFO [train.py:886] (3/4) Epoch 30, batch 1400, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4936303.69 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:34:44,812 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.927e+01 3.190e+01 3.312e+01 3.498e+01 3.963e+01, threshold=6.624e+01, percent-clipped=0.0 2023-12-23 03:34:45,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=931093.3333333334, ans=0.125 2023-12-23 03:34:46,708 INFO [train.py:886] (3/4) Epoch 30, batch 1450, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4943029.15 frames. ], batch size: 99, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:34:52,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931093.3333333334, ans=0.1 2023-12-23 03:34:58,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-23 03:35:04,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931160.0, ans=0.1 2023-12-23 03:35:35,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-12-23 03:35:39,327 INFO [train.py:886] (3/4) Epoch 30, batch 1500, loss[loss=0.01358, audio_tagging_loss=0.01358, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4947120.10 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:35:40,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=931426.6666666666, ans=0.125 2023-12-23 03:36:00,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=931560.0, ans=0.125 2023-12-23 03:36:05,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=931560.0, ans=0.0 2023-12-23 03:36:24,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-12-23 03:36:29,272 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.252e+01 3.425e+01 3.586e+01 4.549e+01, threshold=6.850e+01, percent-clipped=0.0 2023-12-23 03:36:31,152 INFO [train.py:886] (3/4) Epoch 30, batch 1550, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24945.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4949515.02 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:36:34,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=931760.0, ans=0.09899494936611666 2023-12-23 03:36:52,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=931893.3333333334, ans=0.0 2023-12-23 03:36:53,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=931893.3333333334, ans=0.0 2023-12-23 03:36:56,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=931893.3333333334, ans=0.0 2023-12-23 03:36:56,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=931893.3333333334, ans=0.1 2023-12-23 03:36:56,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931893.3333333334, ans=0.1 2023-12-23 03:37:21,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=932026.6666666666, ans=0.1 2023-12-23 03:37:22,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=932093.3333333334, ans=0.5 2023-12-23 03:37:23,207 INFO [train.py:886] (3/4) Epoch 30, batch 1600, loss[loss=0.0135, audio_tagging_loss=0.0135, over 21921.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4947863.57 frames. ], batch size: 107, lr: 3.61e-03, grad_scale: 32.0 2023-12-23 03:37:36,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=932160.0, ans=0.2 2023-12-23 03:37:44,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=932226.6666666666, ans=0.0 2023-12-23 03:37:59,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=932293.3333333334, ans=0.2 2023-12-23 03:38:12,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=932360.0, ans=0.125 2023-12-23 03:38:13,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=932360.0, ans=0.0 2023-12-23 03:38:14,423 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.234e+01 3.380e+01 3.507e+01 4.339e+01, threshold=6.761e+01, percent-clipped=0.0 2023-12-23 03:38:16,313 INFO [train.py:886] (3/4) Epoch 30, batch 1650, loss[loss=0.01258, audio_tagging_loss=0.01258, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4945252.21 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:38:16,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.15 vs. limit=12.0 2023-12-23 03:38:30,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=932493.3333333334, ans=0.0 2023-12-23 03:38:42,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=932560.0, ans=0.0 2023-12-23 03:38:45,439 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:38:48,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=932626.6666666666, ans=0.125 2023-12-23 03:39:07,497 INFO [train.py:886] (3/4) Epoch 30, batch 1700, loss[loss=0.01328, audio_tagging_loss=0.01328, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4943973.97 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:39:28,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=932893.3333333334, ans=0.125 2023-12-23 03:39:31,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 03:39:42,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=932960.0, ans=0.1 2023-12-23 03:39:57,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=933026.6666666666, ans=0.125 2023-12-23 03:39:58,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.264e+01 3.396e+01 3.507e+01 4.450e+01, threshold=6.793e+01, percent-clipped=0.0 2023-12-23 03:40:00,129 INFO [train.py:886] (3/4) Epoch 30, batch 1750, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4946838.84 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:40:05,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=22.5 2023-12-23 03:40:11,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=933160.0, ans=0.125 2023-12-23 03:40:20,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=933226.6666666666, ans=0.125 2023-12-23 03:40:21,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933226.6666666666, ans=0.1 2023-12-23 03:40:31,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=933293.3333333334, ans=0.0 2023-12-23 03:40:40,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=933293.3333333334, ans=0.125 2023-12-23 03:40:44,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-12-23 03:40:51,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-12-23 03:40:55,020 INFO [train.py:886] (3/4) Epoch 30, batch 1800, loss[loss=0.01244, audio_tagging_loss=0.01244, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4947677.08 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:41:24,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=933626.6666666666, ans=0.125 2023-12-23 03:41:40,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=933693.3333333334, ans=0.1 2023-12-23 03:41:43,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-23 03:41:43,729 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+01 3.246e+01 3.412e+01 3.538e+01 4.076e+01, threshold=6.825e+01, percent-clipped=0.0 2023-12-23 03:41:45,631 INFO [train.py:886] (3/4) Epoch 30, batch 1850, loss[loss=0.01334, audio_tagging_loss=0.01334, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4953008.53 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:41:59,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=933826.6666666666, ans=0.125 2023-12-23 03:42:14,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=933893.3333333334, ans=0.0 2023-12-23 03:42:37,475 INFO [train.py:886] (3/4) Epoch 30, batch 1900, loss[loss=0.01562, audio_tagging_loss=0.01562, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4945199.74 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:43:01,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=934226.6666666666, ans=0.125 2023-12-23 03:43:08,734 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:43:25,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=934360.0, ans=10.0 2023-12-23 03:43:26,844 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.318e+01 3.464e+01 3.638e+01 4.261e+01, threshold=6.929e+01, percent-clipped=0.0 2023-12-23 03:43:29,480 INFO [train.py:886] (3/4) Epoch 30, batch 1950, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4939466.12 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 32.0 2023-12-23 03:43:29,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-12-23 03:43:43,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=934493.3333333334, ans=0.125 2023-12-23 03:43:48,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:43:56,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=934560.0, ans=0.125 2023-12-23 03:44:21,297 INFO [train.py:886] (3/4) Epoch 30, batch 2000, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4942856.54 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:44:30,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=934826.6666666666, ans=0.125 2023-12-23 03:44:39,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=934826.6666666666, ans=0.0 2023-12-23 03:44:43,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=934893.3333333334, ans=0.125 2023-12-23 03:44:56,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=934960.0, ans=0.125 2023-12-23 03:45:11,881 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.798e+01 3.193e+01 3.321e+01 3.490e+01 4.233e+01, threshold=6.642e+01, percent-clipped=0.0 2023-12-23 03:45:13,805 INFO [train.py:886] (3/4) Epoch 30, batch 2050, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4941312.42 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:45:19,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-23 03:45:26,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-12-23 03:45:38,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-12-23 03:45:38,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=935226.6666666666, ans=0.2 2023-12-23 03:45:44,433 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:46:05,440 INFO [train.py:886] (3/4) Epoch 30, batch 2100, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4944566.09 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:46:08,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=935426.6666666666, ans=0.0 2023-12-23 03:46:25,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-12-23 03:46:48,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=935693.3333333334, ans=0.125 2023-12-23 03:46:49,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=935693.3333333334, ans=0.0 2023-12-23 03:46:51,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=935693.3333333334, ans=0.125 2023-12-23 03:46:54,847 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.237e+01 3.407e+01 3.545e+01 4.095e+01, threshold=6.814e+01, percent-clipped=0.0 2023-12-23 03:46:56,776 INFO [train.py:886] (3/4) Epoch 30, batch 2150, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24021.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4944532.42 frames. ], batch size: 100, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:47:06,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=935826.6666666666, ans=0.125 2023-12-23 03:47:21,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-23 03:47:27,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2023-12-23 03:47:38,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=936026.6666666666, ans=0.125 2023-12-23 03:47:42,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=936026.6666666666, ans=0.1 2023-12-23 03:47:46,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=936026.6666666666, ans=0.125 2023-12-23 03:47:49,781 INFO [train.py:886] (3/4) Epoch 30, batch 2200, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4946709.86 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:48:02,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=936160.0, ans=0.1 2023-12-23 03:48:19,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=936293.3333333334, ans=0.1 2023-12-23 03:48:37,952 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.929e+01 3.275e+01 3.471e+01 3.594e+01 4.076e+01, threshold=6.942e+01, percent-clipped=0.0 2023-12-23 03:48:39,961 INFO [train.py:886] (3/4) Epoch 30, batch 2250, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4948187.78 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:48:40,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=936426.6666666666, ans=0.125 2023-12-23 03:49:02,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=936560.0, ans=0.2 2023-12-23 03:49:06,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=936560.0, ans=0.0 2023-12-23 03:49:10,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=936626.6666666666, ans=0.0 2023-12-23 03:49:19,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=936626.6666666666, ans=0.125 2023-12-23 03:49:32,509 INFO [train.py:886] (3/4) Epoch 30, batch 2300, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4944014.38 frames. ], batch size: 99, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:49:41,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=936826.6666666666, ans=0.0 2023-12-23 03:49:44,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-12-23 03:49:45,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=936826.6666666666, ans=0.125 2023-12-23 03:49:54,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=936893.3333333334, ans=0.125 2023-12-23 03:50:13,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-12-23 03:50:15,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=937026.6666666666, ans=0.04949747468305833 2023-12-23 03:50:17,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-12-23 03:50:21,591 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.201e+01 3.316e+01 3.475e+01 3.980e+01, threshold=6.632e+01, percent-clipped=0.0 2023-12-23 03:50:24,244 INFO [train.py:886] (3/4) Epoch 30, batch 2350, loss[loss=0.01228, audio_tagging_loss=0.01228, over 22689.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4943419.37 frames. ], batch size: 107, lr: 3.60e-03, grad_scale: 64.0 2023-12-23 03:50:32,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=937093.3333333334, ans=0.0 2023-12-23 03:50:38,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=937160.0, ans=0.125 2023-12-23 03:50:40,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=937160.0, ans=0.125 2023-12-23 03:50:45,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-23 03:50:53,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=937293.3333333334, ans=0.125 2023-12-23 03:50:57,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937293.3333333334, ans=0.1 2023-12-23 03:51:04,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=937293.3333333334, ans=0.125 2023-12-23 03:51:08,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=937360.0, ans=0.0 2023-12-23 03:51:13,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=937360.0, ans=0.125 2023-12-23 03:51:15,743 INFO [train.py:886] (3/4) Epoch 30, batch 2400, loss[loss=0.01425, audio_tagging_loss=0.01425, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4947152.01 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:51:17,751 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:51:32,224 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:51:32,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=937493.3333333334, ans=0.1 2023-12-23 03:51:44,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2023-12-23 03:51:48,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=937626.6666666666, ans=0.0 2023-12-23 03:51:54,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=937626.6666666666, ans=0.0 2023-12-23 03:51:59,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-23 03:52:05,952 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.939e+01 3.251e+01 3.352e+01 3.497e+01 5.027e+01, threshold=6.704e+01, percent-clipped=0.0 2023-12-23 03:52:08,559 INFO [train.py:886] (3/4) Epoch 30, batch 2450, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4954106.90 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:52:16,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=937760.0, ans=0.0 2023-12-23 03:52:18,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=937826.6666666666, ans=0.1 2023-12-23 03:52:20,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2023-12-23 03:52:23,886 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:52:32,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937893.3333333334, ans=0.1 2023-12-23 03:52:45,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=937960.0, ans=0.2 2023-12-23 03:52:57,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=938026.6666666666, ans=0.125 2023-12-23 03:52:58,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=938093.3333333334, ans=0.0 2023-12-23 03:52:58,801 INFO [train.py:886] (3/4) Epoch 30, batch 2500, loss[loss=0.01513, audio_tagging_loss=0.01513, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4948575.95 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:53:05,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=938093.3333333334, ans=0.125 2023-12-23 03:53:08,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-12-23 03:53:15,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=938160.0, ans=0.0 2023-12-23 03:53:40,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=938360.0, ans=0.125 2023-12-23 03:53:41,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=938360.0, ans=0.0 2023-12-23 03:53:49,313 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.342e+01 3.462e+01 3.638e+01 4.220e+01, threshold=6.925e+01, percent-clipped=0.0 2023-12-23 03:53:51,303 INFO [train.py:886] (3/4) Epoch 30, batch 2550, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4945678.07 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:54:05,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=938493.3333333334, ans=0.025 2023-12-23 03:54:29,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=938626.6666666666, ans=0.0 2023-12-23 03:54:31,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=938693.3333333334, ans=0.125 2023-12-23 03:54:42,917 INFO [train.py:886] (3/4) Epoch 30, batch 2600, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4942776.56 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:54:58,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-12-23 03:55:13,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=938960.0, ans=0.1 2023-12-23 03:55:20,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=938960.0, ans=0.125 2023-12-23 03:55:21,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=938960.0, ans=0.1 2023-12-23 03:55:32,985 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.923e+01 3.221e+01 3.370e+01 3.530e+01 4.052e+01, threshold=6.740e+01, percent-clipped=0.0 2023-12-23 03:55:34,924 INFO [train.py:886] (3/4) Epoch 30, batch 2650, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4943422.94 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:55:49,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=939160.0, ans=0.025 2023-12-23 03:55:59,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-12-23 03:56:02,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=939226.6666666666, ans=0.0 2023-12-23 03:56:10,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=939293.3333333334, ans=0.0 2023-12-23 03:56:16,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=939360.0, ans=0.2 2023-12-23 03:56:28,535 INFO [train.py:886] (3/4) Epoch 30, batch 2700, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4946020.29 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:56:38,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-12-23 03:56:52,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=939560.0, ans=0.07 2023-12-23 03:56:54,579 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 03:57:05,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=939626.6666666666, ans=0.95 2023-12-23 03:57:16,650 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.960e+01 3.256e+01 3.372e+01 3.519e+01 4.379e+01, threshold=6.744e+01, percent-clipped=0.0 2023-12-23 03:57:18,584 INFO [train.py:886] (3/4) Epoch 30, batch 2750, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4946724.44 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:57:18,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=939760.0, ans=0.125 2023-12-23 03:57:33,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2023-12-23 03:57:36,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=939826.6666666666, ans=0.125 2023-12-23 03:57:46,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=939893.3333333334, ans=0.1 2023-12-23 03:57:46,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=939893.3333333334, ans=0.125 2023-12-23 03:57:47,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=939893.3333333334, ans=0.125 2023-12-23 03:57:55,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=939960.0, ans=0.0 2023-12-23 03:58:03,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=940026.6666666666, ans=0.1 2023-12-23 03:58:11,691 INFO [train.py:886] (3/4) Epoch 30, batch 2800, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24947.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4949975.57 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:58:22,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=940160.0, ans=0.125 2023-12-23 03:58:31,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=940226.6666666666, ans=0.0 2023-12-23 03:58:42,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=940293.3333333334, ans=0.125 2023-12-23 03:58:43,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-12-23 03:59:01,824 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.252e+01 3.401e+01 3.539e+01 4.080e+01, threshold=6.801e+01, percent-clipped=0.0 2023-12-23 03:59:03,684 INFO [train.py:886] (3/4) Epoch 30, batch 2850, loss[loss=0.01191, audio_tagging_loss=0.01191, over 23980.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4938844.48 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 03:59:08,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2023-12-23 03:59:10,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940426.6666666666, ans=0.1 2023-12-23 03:59:30,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-12-23 03:59:33,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=940626.6666666666, ans=0.0 2023-12-23 03:59:43,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=940693.3333333334, ans=0.125 2023-12-23 03:59:54,499 INFO [train.py:886] (3/4) Epoch 30, batch 2900, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4940117.75 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:07,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=940826.6666666666, ans=0.125 2023-12-23 04:00:14,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=940893.3333333334, ans=0.09899494936611666 2023-12-23 04:00:28,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=940960.0, ans=0.0 2023-12-23 04:00:43,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=941026.6666666666, ans=0.0 2023-12-23 04:00:43,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.852e+01 3.200e+01 3.351e+01 3.510e+01 3.990e+01, threshold=6.702e+01, percent-clipped=0.0 2023-12-23 04:00:44,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=941026.6666666666, ans=0.125 2023-12-23 04:00:45,810 INFO [train.py:886] (3/4) Epoch 30, batch 2950, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4937980.82 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:00:53,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-12-23 04:01:01,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=941160.0, ans=0.125 2023-12-23 04:01:04,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=941160.0, ans=0.125 2023-12-23 04:01:06,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=941226.6666666666, ans=10.0 2023-12-23 04:01:12,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=941226.6666666666, ans=0.0 2023-12-23 04:01:21,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=941293.3333333334, ans=0.1 2023-12-23 04:01:27,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=941360.0, ans=0.125 2023-12-23 04:01:29,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=941360.0, ans=0.0 2023-12-23 04:01:30,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=941360.0, ans=0.2 2023-12-23 04:01:32,709 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:01:34,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=941360.0, ans=0.0 2023-12-23 04:01:37,757 INFO [train.py:886] (3/4) Epoch 30, batch 3000, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4941543.17 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:01:37,757 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 04:01:45,245 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6791, 2.6430, 2.4417, 2.2226, 3.8497, 3.4367, 4.0602, 2.3324], device='cuda:3') 2023-12-23 04:01:59,243 INFO [train.py:917] (3/4) Epoch 30, validation: loss=0.03287, audio_tagging_loss=0.03287, over 3737520.00 frames. 2023-12-23 04:01:59,244 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 04:02:07,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2023-12-23 04:02:26,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-12-23 04:02:36,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=941626.6666666666, ans=0.125 2023-12-23 04:02:38,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=941626.6666666666, ans=0.1 2023-12-23 04:02:47,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=941693.3333333334, ans=0.125 2023-12-23 04:02:48,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.215e+01 3.346e+01 3.573e+01 4.041e+01, threshold=6.693e+01, percent-clipped=0.0 2023-12-23 04:02:49,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=941760.0, ans=0.0 2023-12-23 04:02:50,156 INFO [train.py:886] (3/4) Epoch 30, batch 3050, loss[loss=0.01279, audio_tagging_loss=0.01279, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4946487.52 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:03:09,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=941826.6666666666, ans=0.125 2023-12-23 04:03:14,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941893.3333333334, ans=0.1 2023-12-23 04:03:21,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=941960.0, ans=0.0 2023-12-23 04:03:29,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=942026.6666666666, ans=0.0 2023-12-23 04:03:41,696 INFO [train.py:886] (3/4) Epoch 30, batch 3100, loss[loss=0.01359, audio_tagging_loss=0.01359, over 25000.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4950596.37 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:03:46,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=942093.3333333334, ans=0.125 2023-12-23 04:03:49,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=942093.3333333334, ans=0.0 2023-12-23 04:03:58,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=942160.0, ans=0.125 2023-12-23 04:04:06,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=942226.6666666666, ans=0.0 2023-12-23 04:04:28,959 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:04:29,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=942360.0, ans=0.95 2023-12-23 04:04:30,570 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.332e+01 3.451e+01 3.634e+01 4.318e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 04:04:32,498 INFO [train.py:886] (3/4) Epoch 30, batch 3150, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4944459.26 frames. ], batch size: 99, lr: 3.59e-03, grad_scale: 64.0 2023-12-23 04:04:42,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=942493.3333333334, ans=0.125 2023-12-23 04:05:10,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=942626.6666666666, ans=0.2 2023-12-23 04:05:24,528 INFO [train.py:886] (3/4) Epoch 30, batch 3200, loss[loss=0.01415, audio_tagging_loss=0.01415, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4941259.56 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:05:31,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=942760.0, ans=0.125 2023-12-23 04:05:33,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=942826.6666666666, ans=0.125 2023-12-23 04:05:36,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=942826.6666666666, ans=0.125 2023-12-23 04:05:44,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=942893.3333333334, ans=0.125 2023-12-23 04:05:54,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2023-12-23 04:05:56,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-12-23 04:06:00,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=942960.0, ans=10.0 2023-12-23 04:06:04,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2023-12-23 04:06:04,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=943026.6666666666, ans=0.0 2023-12-23 04:06:12,109 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.760e+01 3.281e+01 3.417e+01 3.634e+01 4.167e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 04:06:14,702 INFO [train.py:886] (3/4) Epoch 30, batch 3250, loss[loss=0.01328, audio_tagging_loss=0.01328, over 21751.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4943434.07 frames. ], batch size: 107, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:06:23,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=943093.3333333334, ans=0.0 2023-12-23 04:06:24,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=943093.3333333334, ans=0.1 2023-12-23 04:06:37,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=943226.6666666666, ans=0.125 2023-12-23 04:06:53,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-12-23 04:06:54,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=943293.3333333334, ans=0.0 2023-12-23 04:07:06,658 INFO [train.py:886] (3/4) Epoch 30, batch 3300, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4949073.22 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:07:06,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=943426.6666666666, ans=0.125 2023-12-23 04:07:10,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=943426.6666666666, ans=0.125 2023-12-23 04:07:11,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-12-23 04:07:22,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-12-23 04:07:49,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=943693.3333333334, ans=0.125 2023-12-23 04:07:55,900 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.270e+01 3.395e+01 3.544e+01 4.890e+01, threshold=6.790e+01, percent-clipped=0.0 2023-12-23 04:07:58,514 INFO [train.py:886] (3/4) Epoch 30, batch 3350, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4945372.28 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:00,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=943760.0, ans=0.0 2023-12-23 04:08:04,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=943760.0, ans=0.125 2023-12-23 04:08:04,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=943760.0, ans=0.125 2023-12-23 04:08:13,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=943826.6666666666, ans=0.125 2023-12-23 04:08:15,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-23 04:08:20,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=943893.3333333334, ans=0.0 2023-12-23 04:08:28,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943960.0, ans=0.1 2023-12-23 04:08:38,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=944026.6666666666, ans=0.2 2023-12-23 04:08:41,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=944026.6666666666, ans=0.125 2023-12-23 04:08:42,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=944026.6666666666, ans=0.0 2023-12-23 04:08:48,480 INFO [train.py:886] (3/4) Epoch 30, batch 3400, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4955249.36 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:08:53,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=944093.3333333334, ans=0.125 2023-12-23 04:08:54,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=944093.3333333334, ans=0.125 2023-12-23 04:09:00,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=944160.0, ans=0.125 2023-12-23 04:09:06,229 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:09:07,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944160.0, ans=0.1 2023-12-23 04:09:18,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=944293.3333333334, ans=0.125 2023-12-23 04:09:26,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-12-23 04:09:30,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=944360.0, ans=0.125 2023-12-23 04:09:37,165 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.991e+01 3.324e+01 3.423e+01 3.624e+01 4.414e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 04:09:39,082 INFO [train.py:886] (3/4) Epoch 30, batch 3450, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4950317.57 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:09:42,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944426.6666666666, ans=0.1 2023-12-23 04:09:43,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=12.0 2023-12-23 04:09:44,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=944426.6666666666, ans=0.125 2023-12-23 04:09:57,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=944493.3333333334, ans=15.0 2023-12-23 04:10:03,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-12-23 04:10:22,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944693.3333333334, ans=0.1 2023-12-23 04:10:26,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=944693.3333333334, ans=0.2 2023-12-23 04:10:30,259 INFO [train.py:886] (3/4) Epoch 30, batch 3500, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4943433.71 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:10:34,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=944760.0, ans=0.2 2023-12-23 04:10:49,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-12-23 04:10:50,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=944893.3333333334, ans=0.0 2023-12-23 04:10:55,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=944893.3333333334, ans=0.5 2023-12-23 04:10:59,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=944960.0, ans=0.125 2023-12-23 04:11:13,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=945026.6666666666, ans=0.2 2023-12-23 04:11:17,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2023-12-23 04:11:19,704 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.755e+01 3.226e+01 3.344e+01 3.619e+01 4.102e+01, threshold=6.688e+01, percent-clipped=0.0 2023-12-23 04:11:21,608 INFO [train.py:886] (3/4) Epoch 30, batch 3550, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4941003.41 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:12:14,348 INFO [train.py:886] (3/4) Epoch 30, batch 3600, loss[loss=0.01292, audio_tagging_loss=0.01292, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4947709.81 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:12:17,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-12-23 04:12:19,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=15.0 2023-12-23 04:12:42,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=945560.0, ans=0.125 2023-12-23 04:12:50,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-12-23 04:12:52,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2023-12-23 04:12:58,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=945693.3333333334, ans=0.125 2023-12-23 04:13:02,253 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.867e+01 3.251e+01 3.392e+01 3.502e+01 4.191e+01, threshold=6.784e+01, percent-clipped=0.0 2023-12-23 04:13:04,186 INFO [train.py:886] (3/4) Epoch 30, batch 3650, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4953104.47 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:13:04,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=945760.0, ans=0.125 2023-12-23 04:13:11,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=945760.0, ans=0.5 2023-12-23 04:13:18,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=945826.6666666666, ans=0.0 2023-12-23 04:13:19,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-23 04:13:25,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=945893.3333333334, ans=0.0 2023-12-23 04:13:28,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=945893.3333333334, ans=0.1 2023-12-23 04:13:37,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=945960.0, ans=0.125 2023-12-23 04:13:39,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-12-23 04:13:41,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=945960.0, ans=0.1 2023-12-23 04:13:41,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=945960.0, ans=0.05 2023-12-23 04:13:48,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=946026.6666666666, ans=0.2 2023-12-23 04:13:48,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=946026.6666666666, ans=0.1 2023-12-23 04:13:57,396 INFO [train.py:886] (3/4) Epoch 30, batch 3700, loss[loss=0.01388, audio_tagging_loss=0.01388, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4958255.98 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:05,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=946160.0, ans=0.125 2023-12-23 04:14:09,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2023-12-23 04:14:11,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=946160.0, ans=0.2 2023-12-23 04:14:26,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=946293.3333333334, ans=0.1 2023-12-23 04:14:26,467 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:14:29,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=946293.3333333334, ans=0.125 2023-12-23 04:14:34,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=946293.3333333334, ans=0.1 2023-12-23 04:14:38,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-12-23 04:14:40,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=946360.0, ans=0.0 2023-12-23 04:14:45,607 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.248e+01 3.376e+01 3.563e+01 4.172e+01, threshold=6.751e+01, percent-clipped=0.0 2023-12-23 04:14:47,531 INFO [train.py:886] (3/4) Epoch 30, batch 3750, loss[loss=0.01355, audio_tagging_loss=0.01355, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4952656.98 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:14:53,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=946426.6666666666, ans=0.125 2023-12-23 04:15:05,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=946493.3333333334, ans=0.2 2023-12-23 04:15:06,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=946493.3333333334, ans=0.2 2023-12-23 04:15:16,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=946560.0, ans=0.0 2023-12-23 04:15:39,175 INFO [train.py:886] (3/4) Epoch 30, batch 3800, loss[loss=0.01232, audio_tagging_loss=0.01232, over 24750.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4943937.86 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:15:39,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=946760.0, ans=0.0 2023-12-23 04:16:02,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=946893.3333333334, ans=0.125 2023-12-23 04:16:18,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2023-12-23 04:16:26,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=947026.6666666666, ans=0.1 2023-12-23 04:16:28,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:16:29,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.901e+01 3.289e+01 3.427e+01 3.573e+01 5.060e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 04:16:31,292 INFO [train.py:886] (3/4) Epoch 30, batch 3850, loss[loss=0.009425, audio_tagging_loss=0.009425, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4943060.36 frames. ], batch size: 99, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:16:52,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-23 04:16:53,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=947226.6666666666, ans=0.0 2023-12-23 04:17:00,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=947226.6666666666, ans=0.0 2023-12-23 04:17:01,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947293.3333333334, ans=0.1 2023-12-23 04:17:22,778 INFO [train.py:886] (3/4) Epoch 30, batch 3900, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4947874.38 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:17:29,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=947426.6666666666, ans=0.0 2023-12-23 04:17:32,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=947493.3333333334, ans=0.2 2023-12-23 04:17:46,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=947560.0, ans=0.5 2023-12-23 04:17:50,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=947560.0, ans=0.125 2023-12-23 04:17:50,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=947560.0, ans=0.0 2023-12-23 04:18:12,358 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.816e+01 3.238e+01 3.413e+01 3.556e+01 4.213e+01, threshold=6.826e+01, percent-clipped=0.0 2023-12-23 04:18:14,272 INFO [train.py:886] (3/4) Epoch 30, batch 3950, loss[loss=0.01308, audio_tagging_loss=0.01308, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4948197.88 frames. ], batch size: 100, lr: 3.58e-03, grad_scale: 64.0 2023-12-23 04:18:15,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=947760.0, ans=0.125 2023-12-23 04:18:39,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=947893.3333333334, ans=0.125 2023-12-23 04:18:47,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-12-23 04:18:56,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=948026.6666666666, ans=0.2 2023-12-23 04:19:07,782 INFO [train.py:886] (3/4) Epoch 30, batch 4000, loss[loss=0.01337, audio_tagging_loss=0.01337, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4952785.17 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 128.0 2023-12-23 04:19:11,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=948093.3333333334, ans=0.125 2023-12-23 04:19:57,729 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.306e+01 3.420e+01 3.603e+01 4.772e+01, threshold=6.841e+01, percent-clipped=0.0 2023-12-23 04:19:58,689 INFO [train.py:886] (3/4) Epoch 30, batch 4050, loss[loss=0.01015, audio_tagging_loss=0.01015, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4945526.31 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:11,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=948493.3333333334, ans=0.0 2023-12-23 04:20:15,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=948493.3333333334, ans=0.0 2023-12-23 04:20:21,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2023-12-23 04:20:31,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=948626.6666666666, ans=0.1 2023-12-23 04:20:49,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=948693.3333333334, ans=0.0 2023-12-23 04:20:51,100 INFO [train.py:886] (3/4) Epoch 30, batch 4100, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4945146.14 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:20:57,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=948760.0, ans=0.125 2023-12-23 04:21:00,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=948826.6666666666, ans=0.125 2023-12-23 04:21:20,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=948893.3333333334, ans=0.125 2023-12-23 04:21:22,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2023-12-23 04:21:27,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=948960.0, ans=0.125 2023-12-23 04:21:27,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-23 04:21:31,415 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:21:34,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=949026.6666666666, ans=0.1 2023-12-23 04:21:35,842 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:21:35,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=949026.6666666666, ans=0.125 2023-12-23 04:21:41,983 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.264e+01 3.402e+01 3.586e+01 4.088e+01, threshold=6.804e+01, percent-clipped=0.0 2023-12-23 04:21:43,646 INFO [train.py:886] (3/4) Epoch 30, batch 4150, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4942782.52 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:21:54,879 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:21:59,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=949160.0, ans=0.0 2023-12-23 04:22:34,552 INFO [train.py:886] (3/4) Epoch 30, batch 4200, loss[loss=0.01672, audio_tagging_loss=0.01672, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4943253.89 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:22:43,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=949426.6666666666, ans=0.07 2023-12-23 04:22:56,563 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:23:00,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=949560.0, ans=0.0 2023-12-23 04:23:09,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=949626.6666666666, ans=0.125 2023-12-23 04:23:21,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=949693.3333333334, ans=0.0 2023-12-23 04:23:25,797 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.859e+01 3.208e+01 3.393e+01 3.521e+01 4.162e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:23:26,749 INFO [train.py:886] (3/4) Epoch 30, batch 4250, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4945561.74 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:23:33,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2023-12-23 04:23:56,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=949893.3333333334, ans=0.125 2023-12-23 04:23:57,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=949960.0, ans=0.125 2023-12-23 04:24:11,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=950026.6666666666, ans=0.2 2023-12-23 04:24:13,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=950026.6666666666, ans=0.0 2023-12-23 04:24:17,244 INFO [train.py:886] (3/4) Epoch 30, batch 4300, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4947267.32 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:24:31,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=950160.0, ans=0.025 2023-12-23 04:24:33,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=950160.0, ans=0.0 2023-12-23 04:24:33,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=950160.0, ans=0.0 2023-12-23 04:24:36,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-12-23 04:25:08,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.009e+01 3.316e+01 3.415e+01 3.581e+01 5.246e+01, threshold=6.831e+01, percent-clipped=0.0 2023-12-23 04:25:09,186 INFO [train.py:886] (3/4) Epoch 30, batch 4350, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4944717.20 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:25:16,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=950426.6666666666, ans=0.0 2023-12-23 04:25:19,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=950493.3333333334, ans=0.125 2023-12-23 04:25:35,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=950560.0, ans=0.1 2023-12-23 04:25:56,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=950693.3333333334, ans=0.0 2023-12-23 04:25:59,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=950693.3333333334, ans=0.125 2023-12-23 04:26:02,656 INFO [train.py:886] (3/4) Epoch 30, batch 4400, loss[loss=0.01152, audio_tagging_loss=0.01152, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4943957.89 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:26:08,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=950760.0, ans=0.1 2023-12-23 04:26:19,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=950826.6666666666, ans=0.125 2023-12-23 04:26:51,211 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.941e+01 3.268e+01 3.436e+01 3.608e+01 4.178e+01, threshold=6.872e+01, percent-clipped=0.0 2023-12-23 04:26:52,193 INFO [train.py:886] (3/4) Epoch 30, batch 4450, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4938057.72 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:27:02,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-12-23 04:27:12,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=22.5 2023-12-23 04:27:20,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-12-23 04:27:23,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951293.3333333334, ans=0.1 2023-12-23 04:27:41,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=951360.0, ans=0.0 2023-12-23 04:27:43,332 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:27:45,073 INFO [train.py:886] (3/4) Epoch 30, batch 4500, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4929662.60 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:27:53,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=951493.3333333334, ans=0.2 2023-12-23 04:27:59,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=951493.3333333334, ans=0.05 2023-12-23 04:28:10,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=951560.0, ans=0.2 2023-12-23 04:28:13,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=951560.0, ans=0.1 2023-12-23 04:28:23,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=951626.6666666666, ans=0.2 2023-12-23 04:28:34,497 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.270e+01 3.347e+01 3.646e+01 4.061e+01, threshold=6.694e+01, percent-clipped=0.0 2023-12-23 04:28:35,443 INFO [train.py:886] (3/4) Epoch 30, batch 4550, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4941467.17 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:28:49,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=951826.6666666666, ans=0.95 2023-12-23 04:28:52,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=951826.6666666666, ans=0.125 2023-12-23 04:28:53,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-23 04:29:01,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=951893.3333333334, ans=0.0 2023-12-23 04:29:14,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=951960.0, ans=0.125 2023-12-23 04:29:27,434 INFO [train.py:886] (3/4) Epoch 30, batch 4600, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4945669.24 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:29:37,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=8.0 2023-12-23 04:29:39,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-12-23 04:29:51,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=952226.6666666666, ans=0.1 2023-12-23 04:29:53,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=952226.6666666666, ans=0.125 2023-12-23 04:29:53,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2023-12-23 04:30:09,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-23 04:30:19,065 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.884e+01 3.239e+01 3.388e+01 3.549e+01 4.736e+01, threshold=6.775e+01, percent-clipped=0.0 2023-12-23 04:30:20,032 INFO [train.py:886] (3/4) Epoch 30, batch 4650, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24927.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4951911.28 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:30:28,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-12-23 04:30:30,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-12-23 04:30:40,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=952560.0, ans=0.125 2023-12-23 04:30:41,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952560.0, ans=0.1 2023-12-23 04:31:07,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=952693.3333333334, ans=0.125 2023-12-23 04:31:10,386 INFO [train.py:886] (3/4) Epoch 30, batch 4700, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4950327.72 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:31:15,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=952760.0, ans=0.125 2023-12-23 04:31:24,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-23 04:31:25,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=952826.6666666666, ans=0.125 2023-12-23 04:31:26,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=952826.6666666666, ans=0.1 2023-12-23 04:31:29,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2023-12-23 04:31:44,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=952960.0, ans=0.125 2023-12-23 04:31:56,730 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.362e+01 3.489e+01 3.665e+01 4.317e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 04:31:56,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=953093.3333333334, ans=0.125 2023-12-23 04:31:57,661 INFO [train.py:886] (3/4) Epoch 30, batch 4750, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4943585.82 frames. ], batch size: 99, lr: 3.57e-03, grad_scale: 64.0 2023-12-23 04:31:57,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=953093.3333333334, ans=0.2 2023-12-23 04:32:32,840 INFO [train.py:886] (3/4) Epoch 31, batch 0, loss[loss=0.03176, audio_tagging_loss=0.03176, over 21237.00 frames. ], tot_loss[loss=0.03176, audio_tagging_loss=0.03176, over 21237.00 frames. ], batch size: 107, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:32:32,840 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 04:32:43,575 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.6900, 2.8309, 2.4947, 2.2597, 3.8315, 3.3959, 4.0483, 2.3664], device='cuda:3') 2023-12-23 04:32:44,201 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5517, 3.4005, 3.9906, 4.1480], device='cuda:3') 2023-12-23 04:32:54,302 INFO [train.py:917] (3/4) Epoch 31, validation: loss=0.03297, audio_tagging_loss=0.03297, over 3737520.00 frames. 2023-12-23 04:32:54,302 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 04:32:56,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=953200.0, ans=0.0 2023-12-23 04:33:00,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=953200.0, ans=0.1 2023-12-23 04:33:24,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=953400.0, ans=0.125 2023-12-23 04:33:25,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=953400.0, ans=0.2 2023-12-23 04:33:28,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=953400.0, ans=0.2 2023-12-23 04:33:38,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=953466.6666666666, ans=0.125 2023-12-23 04:33:44,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=953533.3333333334, ans=0.2 2023-12-23 04:33:44,878 INFO [train.py:886] (3/4) Epoch 31, batch 50, loss[loss=0.01814, audio_tagging_loss=0.01814, over 25000.00 frames. ], tot_loss[loss=0.01991, audio_tagging_loss=0.01991, over 1120221.66 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2023-12-23 04:33:59,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-12-23 04:34:16,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=953733.3333333334, ans=0.07 2023-12-23 04:34:20,593 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.845e+01 4.121e+01 4.670e+01 9.872e+01, threshold=8.242e+01, percent-clipped=8.0 2023-12-23 04:34:37,989 INFO [train.py:886] (3/4) Epoch 31, batch 100, loss[loss=0.01462, audio_tagging_loss=0.01462, over 24750.00 frames. ], tot_loss[loss=0.01719, audio_tagging_loss=0.01719, over 1976230.02 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:34:44,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=12.0 2023-12-23 04:34:46,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-12-23 04:34:50,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=953933.3333333334, ans=0.125 2023-12-23 04:34:52,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=953933.3333333334, ans=0.125 2023-12-23 04:34:58,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=954000.0, ans=0.1 2023-12-23 04:35:17,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=954066.6666666666, ans=0.125 2023-12-23 04:35:21,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2023-12-23 04:35:29,339 INFO [train.py:886] (3/4) Epoch 31, batch 150, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01564, audio_tagging_loss=0.01564, over 2638523.32 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:35:33,982 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:35:47,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-12-23 04:36:00,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=954400.0, ans=0.125 2023-12-23 04:36:05,258 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.100e+01 3.397e+01 3.570e+01 3.705e+01 4.340e+01, threshold=7.141e+01, percent-clipped=0.0 2023-12-23 04:36:13,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-12-23 04:36:14,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=954466.6666666666, ans=0.2 2023-12-23 04:36:22,015 INFO [train.py:886] (3/4) Epoch 31, batch 200, loss[loss=0.009507, audio_tagging_loss=0.009507, over 25000.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 3153737.12 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:36:33,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2023-12-23 04:36:39,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=954600.0, ans=0.125 2023-12-23 04:36:39,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-12-23 04:36:48,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=954666.6666666666, ans=0.125 2023-12-23 04:37:14,759 INFO [train.py:886] (3/4) Epoch 31, batch 250, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01419, audio_tagging_loss=0.01419, over 3560482.13 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:37:23,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=954933.3333333334, ans=0.125 2023-12-23 04:37:50,812 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.801e+01 3.236e+01 3.404e+01 3.581e+01 4.137e+01, threshold=6.809e+01, percent-clipped=0.0 2023-12-23 04:38:03,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=955133.3333333334, ans=0.1 2023-12-23 04:38:06,961 INFO [train.py:886] (3/4) Epoch 31, batch 300, loss[loss=0.01304, audio_tagging_loss=0.01304, over 24750.00 frames. ], tot_loss[loss=0.01392, audio_tagging_loss=0.01392, over 3867088.83 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:38:14,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=955200.0, ans=0.125 2023-12-23 04:38:17,106 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:38:34,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=955333.3333333334, ans=0.125 2023-12-23 04:38:36,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-12-23 04:38:44,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=955400.0, ans=0.125 2023-12-23 04:38:56,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=955466.6666666666, ans=0.2 2023-12-23 04:38:59,465 INFO [train.py:886] (3/4) Epoch 31, batch 350, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01364, audio_tagging_loss=0.01364, over 4095706.45 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:39:28,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-12-23 04:39:35,045 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.321e+01 3.436e+01 3.601e+01 4.104e+01, threshold=6.873e+01, percent-clipped=0.0 2023-12-23 04:39:38,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=955733.3333333334, ans=0.0 2023-12-23 04:39:48,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-12-23 04:39:52,384 INFO [train.py:886] (3/4) Epoch 31, batch 400, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01334, audio_tagging_loss=0.01334, over 4282953.71 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:39:54,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=12.0 2023-12-23 04:39:56,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-12-23 04:40:28,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=956066.6666666666, ans=0.2 2023-12-23 04:40:44,907 INFO [train.py:886] (3/4) Epoch 31, batch 450, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01303, audio_tagging_loss=0.01303, over 4433050.14 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:40:46,159 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.831e-02 2023-12-23 04:41:06,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=956333.3333333334, ans=0.125 2023-12-23 04:41:20,610 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.863e+01 3.195e+01 3.396e+01 3.625e+01 4.055e+01, threshold=6.792e+01, percent-clipped=0.0 2023-12-23 04:41:24,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=956400.0, ans=0.125 2023-12-23 04:41:24,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=956400.0, ans=0.125 2023-12-23 04:41:24,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=956400.0, ans=0.0 2023-12-23 04:41:25,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=956400.0, ans=0.5 2023-12-23 04:41:28,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=956466.6666666666, ans=0.125 2023-12-23 04:41:32,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=956466.6666666666, ans=0.0 2023-12-23 04:41:37,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=956533.3333333334, ans=0.125 2023-12-23 04:41:38,114 INFO [train.py:886] (3/4) Epoch 31, batch 500, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4548817.25 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:41:42,120 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:41:48,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=956600.0, ans=0.0 2023-12-23 04:41:55,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=956600.0, ans=0.125 2023-12-23 04:42:15,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=956733.3333333334, ans=0.125 2023-12-23 04:42:24,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=956800.0, ans=0.125 2023-12-23 04:42:30,456 INFO [train.py:886] (3/4) Epoch 31, batch 550, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4635946.56 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:42:36,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=956866.6666666666, ans=0.125 2023-12-23 04:42:54,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=957000.0, ans=0.1 2023-12-23 04:43:01,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=957066.6666666666, ans=0.125 2023-12-23 04:43:03,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-12-23 04:43:05,866 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.280e+01 3.449e+01 3.593e+01 5.125e+01, threshold=6.898e+01, percent-clipped=0.0 2023-12-23 04:43:07,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=957066.6666666666, ans=0.0 2023-12-23 04:43:17,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=957133.3333333334, ans=0.0 2023-12-23 04:43:20,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2023-12-23 04:43:20,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=957133.3333333334, ans=0.07 2023-12-23 04:43:20,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=957133.3333333334, ans=0.2 2023-12-23 04:43:22,640 INFO [train.py:886] (3/4) Epoch 31, batch 600, loss[loss=0.01017, audio_tagging_loss=0.01017, over 21094.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4696659.27 frames. ], batch size: 107, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:43:43,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2023-12-23 04:43:53,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=957400.0, ans=0.2 2023-12-23 04:44:00,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=957400.0, ans=0.125 2023-12-23 04:44:08,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=957466.6666666666, ans=0.07 2023-12-23 04:44:15,880 INFO [train.py:886] (3/4) Epoch 31, batch 650, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01295, audio_tagging_loss=0.01295, over 4749228.52 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:44:17,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=957533.3333333334, ans=0.2 2023-12-23 04:44:20,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-12-23 04:44:37,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=957666.6666666666, ans=0.0 2023-12-23 04:44:45,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=957666.6666666666, ans=0.0 2023-12-23 04:44:51,803 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.011e+01 3.328e+01 3.481e+01 3.624e+01 4.466e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 04:45:02,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.19 vs. limit=22.5 2023-12-23 04:45:06,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=957866.6666666666, ans=0.0 2023-12-23 04:45:07,155 INFO [train.py:886] (3/4) Epoch 31, batch 700, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4790183.30 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:45:25,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=957933.3333333334, ans=0.0 2023-12-23 04:45:26,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=957933.3333333334, ans=0.0 2023-12-23 04:45:28,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=958000.0, ans=0.125 2023-12-23 04:45:30,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=958000.0, ans=15.0 2023-12-23 04:45:43,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=958066.6666666666, ans=0.2 2023-12-23 04:45:45,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-23 04:45:50,105 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.089e-03 2023-12-23 04:45:55,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=958133.3333333334, ans=0.09899494936611666 2023-12-23 04:45:55,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2023-12-23 04:46:00,013 INFO [train.py:886] (3/4) Epoch 31, batch 750, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 4816167.70 frames. ], batch size: 99, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:46:04,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=958200.0, ans=0.125 2023-12-23 04:46:29,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=958333.3333333334, ans=0.125 2023-12-23 04:46:31,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=958400.0, ans=0.0 2023-12-23 04:46:33,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=958400.0, ans=0.125 2023-12-23 04:46:34,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=958400.0, ans=0.125 2023-12-23 04:46:35,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.246e+01 3.400e+01 3.570e+01 4.142e+01, threshold=6.799e+01, percent-clipped=0.0 2023-12-23 04:46:35,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=958400.0, ans=0.125 2023-12-23 04:46:52,361 INFO [train.py:886] (3/4) Epoch 31, batch 800, loss[loss=0.01455, audio_tagging_loss=0.01455, over 25000.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 4842015.81 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:46:57,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=958533.3333333334, ans=0.1 2023-12-23 04:47:05,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=958600.0, ans=0.0 2023-12-23 04:47:07,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=958600.0, ans=0.125 2023-12-23 04:47:09,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=958600.0, ans=0.125 2023-12-23 04:47:11,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=958600.0, ans=0.0 2023-12-23 04:47:16,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=958666.6666666666, ans=0.0 2023-12-23 04:47:32,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958733.3333333334, ans=0.1 2023-12-23 04:47:44,842 INFO [train.py:886] (3/4) Epoch 31, batch 850, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4867999.41 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:47:54,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=958933.3333333334, ans=0.125 2023-12-23 04:47:54,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-23 04:48:19,407 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.919e+01 3.282e+01 3.408e+01 3.536e+01 4.077e+01, threshold=6.816e+01, percent-clipped=0.0 2023-12-23 04:48:36,938 INFO [train.py:886] (3/4) Epoch 31, batch 900, loss[loss=0.01375, audio_tagging_loss=0.01375, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4888116.89 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 32.0 2023-12-23 04:48:56,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=959333.3333333334, ans=0.1 2023-12-23 04:48:58,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=12.0 2023-12-23 04:49:09,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=959400.0, ans=0.07 2023-12-23 04:49:10,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-12-23 04:49:26,584 INFO [train.py:886] (3/4) Epoch 31, batch 950, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01272, audio_tagging_loss=0.01272, over 4896329.26 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:49:28,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=959533.3333333334, ans=0.125 2023-12-23 04:49:43,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2023-12-23 04:49:45,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=15.0 2023-12-23 04:49:50,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-12-23 04:49:52,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=959666.6666666666, ans=0.125 2023-12-23 04:49:54,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=959666.6666666666, ans=0.1 2023-12-23 04:50:02,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=959733.3333333334, ans=0.1 2023-12-23 04:50:02,951 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.982e+01 3.282e+01 3.466e+01 3.627e+01 4.372e+01, threshold=6.931e+01, percent-clipped=0.0 2023-12-23 04:50:20,522 INFO [train.py:886] (3/4) Epoch 31, batch 1000, loss[loss=0.01458, audio_tagging_loss=0.01458, over 25000.00 frames. ], tot_loss[loss=0.01275, audio_tagging_loss=0.01275, over 4900998.65 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:50:37,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=959933.3333333334, ans=0.125 2023-12-23 04:50:45,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-12-23 04:50:46,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=960000.0, ans=0.2 2023-12-23 04:51:11,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-12-23 04:51:16,526 INFO [train.py:886] (3/4) Epoch 31, batch 1050, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4909490.47 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:51:22,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-12-23 04:51:23,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=960200.0, ans=0.125 2023-12-23 04:51:39,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=960333.3333333334, ans=0.2 2023-12-23 04:51:43,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960333.3333333334, ans=0.125 2023-12-23 04:51:44,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=960333.3333333334, ans=0.125 2023-12-23 04:51:50,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=960400.0, ans=0.1 2023-12-23 04:51:52,736 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.261e+01 3.393e+01 3.606e+01 4.316e+01, threshold=6.786e+01, percent-clipped=0.0 2023-12-23 04:51:54,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=960400.0, ans=0.125 2023-12-23 04:52:04,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=960466.6666666666, ans=0.125 2023-12-23 04:52:07,921 INFO [train.py:886] (3/4) Epoch 31, batch 1100, loss[loss=0.01226, audio_tagging_loss=0.01226, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4917116.80 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:52:10,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=960533.3333333334, ans=0.125 2023-12-23 04:52:12,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.42 vs. limit=22.5 2023-12-23 04:52:23,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=960600.0, ans=0.2 2023-12-23 04:52:25,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=960600.0, ans=0.0 2023-12-23 04:52:30,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=960666.6666666666, ans=0.09899494936611666 2023-12-23 04:52:42,759 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.080e-02 2023-12-23 04:52:58,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=960800.0, ans=0.125 2023-12-23 04:53:00,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=960866.6666666666, ans=0.125 2023-12-23 04:53:01,541 INFO [train.py:886] (3/4) Epoch 31, batch 1150, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4926042.40 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:53:16,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=960933.3333333334, ans=0.125 2023-12-23 04:53:17,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=960933.3333333334, ans=0.125 2023-12-23 04:53:18,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.46 vs. limit=8.0 2023-12-23 04:53:36,113 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.954e+01 3.303e+01 3.399e+01 3.563e+01 3.907e+01, threshold=6.798e+01, percent-clipped=0.0 2023-12-23 04:53:38,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=961066.6666666666, ans=0.125 2023-12-23 04:53:51,370 INFO [train.py:886] (3/4) Epoch 31, batch 1200, loss[loss=0.0118, audio_tagging_loss=0.0118, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4932586.07 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:53:56,840 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 04:54:12,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-12-23 04:54:18,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=961333.3333333334, ans=0.2 2023-12-23 04:54:44,882 INFO [train.py:886] (3/4) Epoch 31, batch 1250, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4929550.20 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:55:01,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.67 vs. limit=22.5 2023-12-23 04:55:04,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=961666.6666666666, ans=15.0 2023-12-23 04:55:07,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=961666.6666666666, ans=0.1 2023-12-23 04:55:12,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-12-23 04:55:16,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.18 vs. limit=15.0 2023-12-23 04:55:19,938 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.396e+01 3.498e+01 3.625e+01 4.153e+01, threshold=6.995e+01, percent-clipped=0.0 2023-12-23 04:55:28,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=961800.0, ans=0.125 2023-12-23 04:55:36,755 INFO [train.py:886] (3/4) Epoch 31, batch 1300, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4925927.93 frames. ], batch size: 99, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:55:41,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2023-12-23 04:55:44,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=961866.6666666666, ans=0.125 2023-12-23 04:55:58,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-12-23 04:56:00,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=962000.0, ans=0.125 2023-12-23 04:56:02,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=962000.0, ans=0.125 2023-12-23 04:56:06,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=962000.0, ans=0.0 2023-12-23 04:56:07,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=962066.6666666666, ans=0.125 2023-12-23 04:56:20,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=962133.3333333334, ans=0.125 2023-12-23 04:56:25,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=962133.3333333334, ans=0.07 2023-12-23 04:56:28,648 INFO [train.py:886] (3/4) Epoch 31, batch 1350, loss[loss=0.01351, audio_tagging_loss=0.01351, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4931151.26 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:56:55,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=962333.3333333334, ans=0.125 2023-12-23 04:56:57,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-12-23 04:56:59,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=962400.0, ans=0.0 2023-12-23 04:57:04,304 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.922e+01 3.247e+01 3.434e+01 3.583e+01 4.287e+01, threshold=6.867e+01, percent-clipped=0.0 2023-12-23 04:57:22,557 INFO [train.py:886] (3/4) Epoch 31, batch 1400, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4936753.16 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:57:24,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=962533.3333333334, ans=0.0 2023-12-23 04:57:29,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=962533.3333333334, ans=0.0 2023-12-23 04:57:33,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-12-23 04:57:45,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=962666.6666666666, ans=0.0 2023-12-23 04:57:59,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2023-12-23 04:58:03,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=962800.0, ans=0.95 2023-12-23 04:58:05,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.01 vs. limit=10.0 2023-12-23 04:58:09,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=962800.0, ans=0.125 2023-12-23 04:58:09,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-12-23 04:58:14,710 INFO [train.py:886] (3/4) Epoch 31, batch 1450, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4948304.52 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:58:15,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=962866.6666666666, ans=0.07 2023-12-23 04:58:19,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-23 04:58:24,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=962933.3333333334, ans=0.2 2023-12-23 04:58:42,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=963000.0, ans=0.125 2023-12-23 04:58:43,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-23 04:58:50,667 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.018e+01 3.252e+01 3.359e+01 3.465e+01 3.821e+01, threshold=6.718e+01, percent-clipped=0.0 2023-12-23 04:59:07,362 INFO [train.py:886] (3/4) Epoch 31, batch 1500, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4953532.72 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 04:59:11,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=963200.0, ans=0.2 2023-12-23 04:59:25,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=963266.6666666666, ans=0.0 2023-12-23 04:59:46,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=963400.0, ans=10.0 2023-12-23 04:59:47,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-12-23 04:59:59,932 INFO [train.py:886] (3/4) Epoch 31, batch 1550, loss[loss=0.01481, audio_tagging_loss=0.01481, over 24949.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4957244.75 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:00:11,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=963600.0, ans=0.0 2023-12-23 05:00:17,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=963600.0, ans=0.0 2023-12-23 05:00:18,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=963600.0, ans=0.125 2023-12-23 05:00:20,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=963666.6666666666, ans=0.0 2023-12-23 05:00:28,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=963666.6666666666, ans=0.0 2023-12-23 05:00:35,102 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.972e+01 3.345e+01 3.487e+01 3.654e+01 4.145e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 05:00:50,971 INFO [train.py:886] (3/4) Epoch 31, batch 1600, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24071.00 frames. ], tot_loss[loss=0.01277, audio_tagging_loss=0.01277, over 4953226.67 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:00:54,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=963866.6666666666, ans=0.2 2023-12-23 05:01:09,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=963933.3333333334, ans=0.125 2023-12-23 05:01:23,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=964066.6666666666, ans=0.0 2023-12-23 05:01:28,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=964066.6666666666, ans=0.125 2023-12-23 05:01:35,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=964133.3333333334, ans=0.125 2023-12-23 05:01:38,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=964133.3333333334, ans=0.125 2023-12-23 05:01:43,453 INFO [train.py:886] (3/4) Epoch 31, batch 1650, loss[loss=0.01297, audio_tagging_loss=0.01297, over 21687.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 4951096.60 frames. ], batch size: 107, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:02:18,488 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.266e+01 3.423e+01 3.548e+01 4.336e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:02:35,989 INFO [train.py:886] (3/4) Epoch 31, batch 1700, loss[loss=0.01415, audio_tagging_loss=0.01415, over 25000.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4951334.57 frames. ], batch size: 100, lr: 3.49e-03, grad_scale: 32.0 2023-12-23 05:02:43,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=964533.3333333334, ans=0.125 2023-12-23 05:02:51,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=964600.0, ans=0.0 2023-12-23 05:02:57,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=964666.6666666666, ans=0.125 2023-12-23 05:03:17,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=964800.0, ans=0.125 2023-12-23 05:03:18,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=964800.0, ans=0.0 2023-12-23 05:03:23,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2023-12-23 05:03:27,698 INFO [train.py:886] (3/4) Epoch 31, batch 1750, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4953754.36 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:03:33,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=964866.6666666666, ans=0.0 2023-12-23 05:03:41,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=964933.3333333334, ans=0.125 2023-12-23 05:03:49,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-12-23 05:03:50,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-23 05:04:03,086 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+01 3.266e+01 3.414e+01 3.599e+01 4.382e+01, threshold=6.827e+01, percent-clipped=0.0 2023-12-23 05:04:06,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965066.6666666666, ans=0.1 2023-12-23 05:04:08,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=965133.3333333334, ans=0.0 2023-12-23 05:04:15,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=965133.3333333334, ans=0.2 2023-12-23 05:04:16,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.41 vs. limit=10.0 2023-12-23 05:04:17,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=965133.3333333334, ans=0.0 2023-12-23 05:04:20,474 INFO [train.py:886] (3/4) Epoch 31, batch 1800, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4953601.48 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:04:27,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=965200.0, ans=0.125 2023-12-23 05:04:28,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=965200.0, ans=0.125 2023-12-23 05:04:34,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-12-23 05:04:44,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=965333.3333333334, ans=0.05 2023-12-23 05:04:52,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2023-12-23 05:05:04,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=965466.6666666666, ans=0.2 2023-12-23 05:05:04,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=965466.6666666666, ans=0.0 2023-12-23 05:05:11,665 INFO [train.py:886] (3/4) Epoch 31, batch 1850, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4953089.93 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:05:19,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-12-23 05:05:29,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965600.0, ans=0.1 2023-12-23 05:05:35,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=965666.6666666666, ans=0.125 2023-12-23 05:05:38,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=965666.6666666666, ans=10.0 2023-12-23 05:05:47,722 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.028e+01 3.346e+01 3.530e+01 3.672e+01 4.173e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 05:06:04,202 INFO [train.py:886] (3/4) Epoch 31, batch 1900, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01262, audio_tagging_loss=0.01262, over 4949383.58 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:06:05,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965866.6666666666, ans=0.1 2023-12-23 05:06:16,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=15.0 2023-12-23 05:06:28,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=966000.0, ans=0.2 2023-12-23 05:06:54,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966133.3333333334, ans=0.125 2023-12-23 05:06:57,414 INFO [train.py:886] (3/4) Epoch 31, batch 1950, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4948304.56 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:07:08,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-12-23 05:07:25,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=966333.3333333334, ans=0.05 2023-12-23 05:07:32,862 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.868e+01 3.281e+01 3.424e+01 3.602e+01 4.114e+01, threshold=6.849e+01, percent-clipped=0.0 2023-12-23 05:07:47,980 INFO [train.py:886] (3/4) Epoch 31, batch 2000, loss[loss=0.01615, audio_tagging_loss=0.01615, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4943665.90 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:08:08,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=966666.6666666666, ans=0.0 2023-12-23 05:08:09,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=966666.6666666666, ans=0.2 2023-12-23 05:08:11,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=966666.6666666666, ans=0.0 2023-12-23 05:08:13,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-12-23 05:08:21,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=966733.3333333334, ans=0.125 2023-12-23 05:08:38,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=966800.0, ans=0.125 2023-12-23 05:08:41,028 INFO [train.py:886] (3/4) Epoch 31, batch 2050, loss[loss=0.01056, audio_tagging_loss=0.01056, over 22254.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4942195.51 frames. ], batch size: 107, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:08:48,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2023-12-23 05:09:03,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=967000.0, ans=0.2 2023-12-23 05:09:06,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=967000.0, ans=0.2 2023-12-23 05:09:15,613 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.914e+01 3.257e+01 3.392e+01 3.574e+01 4.327e+01, threshold=6.783e+01, percent-clipped=0.0 2023-12-23 05:09:20,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967133.3333333334, ans=0.125 2023-12-23 05:09:31,440 INFO [train.py:886] (3/4) Epoch 31, batch 2100, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4947014.10 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:09:34,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967200.0, ans=0.1 2023-12-23 05:09:38,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=967200.0, ans=0.2 2023-12-23 05:09:39,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=967200.0, ans=0.125 2023-12-23 05:09:46,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-23 05:10:16,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967466.6666666666, ans=0.1 2023-12-23 05:10:24,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2023-12-23 05:10:24,302 INFO [train.py:886] (3/4) Epoch 31, batch 2150, loss[loss=0.01405, audio_tagging_loss=0.01405, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4948193.87 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:10:27,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=967533.3333333334, ans=0.125 2023-12-23 05:10:32,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2023-12-23 05:10:34,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=967600.0, ans=0.125 2023-12-23 05:10:37,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=967600.0, ans=0.125 2023-12-23 05:10:43,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-23 05:10:59,512 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.971e+01 3.338e+01 3.483e+01 3.621e+01 4.579e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 05:11:02,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=967733.3333333334, ans=0.0 2023-12-23 05:11:03,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=967733.3333333334, ans=0.125 2023-12-23 05:11:13,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=967800.0, ans=0.2 2023-12-23 05:11:17,275 INFO [train.py:886] (3/4) Epoch 31, batch 2200, loss[loss=0.01153, audio_tagging_loss=0.01153, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4943881.56 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:11:35,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=968000.0, ans=0.125 2023-12-23 05:11:43,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=968000.0, ans=0.0 2023-12-23 05:12:01,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=968133.3333333334, ans=0.2 2023-12-23 05:12:03,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=968133.3333333334, ans=0.5 2023-12-23 05:12:03,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=968133.3333333334, ans=0.125 2023-12-23 05:12:08,049 INFO [train.py:886] (3/4) Epoch 31, batch 2250, loss[loss=0.01124, audio_tagging_loss=0.01124, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4940696.82 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:12:11,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-23 05:12:24,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=968266.6666666666, ans=0.95 2023-12-23 05:12:35,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=968333.3333333334, ans=0.125 2023-12-23 05:12:37,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=968333.3333333334, ans=0.0 2023-12-23 05:12:44,289 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.002e+01 3.260e+01 3.452e+01 3.625e+01 3.919e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 05:12:56,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=968466.6666666666, ans=0.125 2023-12-23 05:13:01,709 INFO [train.py:886] (3/4) Epoch 31, batch 2300, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4944215.68 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:13:03,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-12-23 05:13:14,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.15 vs. limit=15.0 2023-12-23 05:13:16,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=968600.0, ans=0.0 2023-12-23 05:13:16,249 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.523e-03 2023-12-23 05:13:18,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=968600.0, ans=0.125 2023-12-23 05:13:27,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2023-12-23 05:13:36,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=968733.3333333334, ans=0.0 2023-12-23 05:13:38,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=968733.3333333334, ans=0.0 2023-12-23 05:13:47,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=968800.0, ans=0.125 2023-12-23 05:13:54,143 INFO [train.py:886] (3/4) Epoch 31, batch 2350, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4944969.85 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:14:11,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=968933.3333333334, ans=0.125 2023-12-23 05:14:23,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=969000.0, ans=0.0 2023-12-23 05:14:29,947 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+01 3.284e+01 3.418e+01 3.549e+01 4.229e+01, threshold=6.835e+01, percent-clipped=0.0 2023-12-23 05:14:35,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=969133.3333333334, ans=0.125 2023-12-23 05:14:35,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969133.3333333334, ans=0.1 2023-12-23 05:14:38,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=969133.3333333334, ans=0.125 2023-12-23 05:14:45,932 INFO [train.py:886] (3/4) Epoch 31, batch 2400, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4947731.34 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:14:56,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=969266.6666666666, ans=0.035 2023-12-23 05:15:07,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2023-12-23 05:15:16,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=969333.3333333334, ans=0.2 2023-12-23 05:15:17,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=969400.0, ans=0.125 2023-12-23 05:15:19,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-12-23 05:15:22,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2023-12-23 05:15:27,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=969466.6666666666, ans=0.0 2023-12-23 05:15:31,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=969466.6666666666, ans=0.125 2023-12-23 05:15:39,261 INFO [train.py:886] (3/4) Epoch 31, batch 2450, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01247, audio_tagging_loss=0.01247, over 4952518.28 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:15:47,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=969533.3333333334, ans=0.0 2023-12-23 05:15:48,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=969600.0, ans=0.07 2023-12-23 05:15:49,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=969600.0, ans=0.0 2023-12-23 05:15:51,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=969600.0, ans=0.0 2023-12-23 05:15:55,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2023-12-23 05:16:00,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=969666.6666666666, ans=0.1 2023-12-23 05:16:14,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.981e+01 3.300e+01 3.451e+01 3.618e+01 3.996e+01, threshold=6.901e+01, percent-clipped=0.0 2023-12-23 05:16:25,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=969800.0, ans=0.0 2023-12-23 05:16:29,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-12-23 05:16:31,304 INFO [train.py:886] (3/4) Epoch 31, batch 2500, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4951266.73 frames. ], batch size: 99, lr: 3.48e-03, grad_scale: 64.0 2023-12-23 05:16:32,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=969866.6666666666, ans=0.125 2023-12-23 05:16:46,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2023-12-23 05:17:09,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=970066.6666666666, ans=0.0 2023-12-23 05:17:17,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=970133.3333333334, ans=0.0 2023-12-23 05:17:19,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=970133.3333333334, ans=0.5 2023-12-23 05:17:22,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=970200.0, ans=0.125 2023-12-23 05:17:23,729 INFO [train.py:886] (3/4) Epoch 31, batch 2550, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4949175.48 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 32.0 2023-12-23 05:17:31,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=970200.0, ans=0.2 2023-12-23 05:17:46,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=970333.3333333334, ans=0.125 2023-12-23 05:17:54,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-12-23 05:17:59,854 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.321e+01 3.428e+01 3.568e+01 4.183e+01, threshold=6.855e+01, percent-clipped=0.0 2023-12-23 05:18:07,504 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:18:15,542 INFO [train.py:886] (3/4) Epoch 31, batch 2600, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4951998.45 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:18:15,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=970533.3333333334, ans=0.2 2023-12-23 05:18:34,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=970600.0, ans=0.0 2023-12-23 05:18:43,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=970666.6666666666, ans=0.125 2023-12-23 05:18:59,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=970800.0, ans=0.125 2023-12-23 05:18:59,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=970800.0, ans=0.0 2023-12-23 05:19:03,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=970800.0, ans=0.95 2023-12-23 05:19:07,515 INFO [train.py:886] (3/4) Epoch 31, batch 2650, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4946830.33 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:19:16,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=970866.6666666666, ans=0.1 2023-12-23 05:19:44,614 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.292e+01 3.427e+01 3.635e+01 4.044e+01, threshold=6.854e+01, percent-clipped=0.0 2023-12-23 05:19:55,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=971133.3333333334, ans=0.125 2023-12-23 05:19:55,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-12-23 05:20:00,189 INFO [train.py:886] (3/4) Epoch 31, batch 2700, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4951351.71 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:20:02,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=971200.0, ans=0.125 2023-12-23 05:20:35,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971400.0, ans=0.1 2023-12-23 05:20:49,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=971466.6666666666, ans=0.125 2023-12-23 05:20:52,242 INFO [train.py:886] (3/4) Epoch 31, batch 2750, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4952957.45 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:20:52,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2023-12-23 05:21:15,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=971666.6666666666, ans=0.125 2023-12-23 05:21:18,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=971666.6666666666, ans=0.2 2023-12-23 05:21:28,307 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.862e+01 3.291e+01 3.432e+01 3.574e+01 3.962e+01, threshold=6.863e+01, percent-clipped=0.0 2023-12-23 05:21:37,766 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:21:44,119 INFO [train.py:886] (3/4) Epoch 31, batch 2800, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4955741.64 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:21:49,081 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:21:49,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-12-23 05:22:04,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=971933.3333333334, ans=0.1 2023-12-23 05:22:06,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=972000.0, ans=0.125 2023-12-23 05:22:18,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=972066.6666666666, ans=0.125 2023-12-23 05:22:19,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=972066.6666666666, ans=0.09899494936611666 2023-12-23 05:22:20,707 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2023-12-23 05:22:36,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=972200.0, ans=0.125 2023-12-23 05:22:36,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=972200.0, ans=0.125 2023-12-23 05:22:37,104 INFO [train.py:886] (3/4) Epoch 31, batch 2850, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4951826.76 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:22:53,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=972266.6666666666, ans=0.125 2023-12-23 05:23:06,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=972333.3333333334, ans=0.125 2023-12-23 05:23:13,339 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.924e+01 3.280e+01 3.439e+01 3.597e+01 4.174e+01, threshold=6.878e+01, percent-clipped=0.0 2023-12-23 05:23:28,949 INFO [train.py:886] (3/4) Epoch 31, batch 2900, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4945790.27 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:23:34,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=972533.3333333334, ans=10.0 2023-12-23 05:23:43,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=972600.0, ans=0.1 2023-12-23 05:23:44,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-12-23 05:24:02,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=972733.3333333334, ans=0.2 2023-12-23 05:24:05,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=972733.3333333334, ans=0.125 2023-12-23 05:24:15,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-23 05:24:20,312 INFO [train.py:886] (3/4) Epoch 31, batch 2950, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4946484.33 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:24:28,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=972866.6666666666, ans=0.125 2023-12-23 05:24:53,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=973066.6666666666, ans=0.2 2023-12-23 05:24:54,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-23 05:24:56,483 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.890e+01 3.283e+01 3.439e+01 3.603e+01 3.990e+01, threshold=6.877e+01, percent-clipped=0.0 2023-12-23 05:25:12,287 INFO [train.py:886] (3/4) Epoch 31, batch 3000, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4945573.54 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:25:12,287 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 05:25:29,694 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6210, 3.0593, 4.2379, 3.8873], device='cuda:3') 2023-12-23 05:25:33,496 INFO [train.py:917] (3/4) Epoch 31, validation: loss=0.03277, audio_tagging_loss=0.03277, over 3737520.00 frames. 2023-12-23 05:25:33,497 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 05:25:41,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-12-23 05:25:54,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=973333.3333333334, ans=0.0 2023-12-23 05:25:55,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-12-23 05:26:25,825 INFO [train.py:886] (3/4) Epoch 31, batch 3050, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4942034.86 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:26:27,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=973533.3333333334, ans=0.2 2023-12-23 05:26:37,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2023-12-23 05:26:47,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=973666.6666666666, ans=0.0 2023-12-23 05:26:47,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=973666.6666666666, ans=0.125 2023-12-23 05:27:01,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2023-12-23 05:27:01,677 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.915e+01 3.290e+01 3.424e+01 3.581e+01 4.026e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:27:10,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=973800.0, ans=0.2 2023-12-23 05:27:18,186 INFO [train.py:886] (3/4) Epoch 31, batch 3100, loss[loss=0.01426, audio_tagging_loss=0.01426, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4949256.34 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:27:22,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=973866.6666666666, ans=0.125 2023-12-23 05:27:22,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=973866.6666666666, ans=0.125 2023-12-23 05:27:29,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-12-23 05:27:39,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=974000.0, ans=0.125 2023-12-23 05:27:59,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=974133.3333333334, ans=0.1 2023-12-23 05:28:09,274 INFO [train.py:886] (3/4) Epoch 31, batch 3150, loss[loss=0.0116, audio_tagging_loss=0.0116, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4946204.23 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:28:44,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=974400.0, ans=0.05 2023-12-23 05:28:45,156 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.907e+01 3.356e+01 3.493e+01 3.608e+01 4.155e+01, threshold=6.985e+01, percent-clipped=0.0 2023-12-23 05:28:47,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=974400.0, ans=0.5 2023-12-23 05:28:52,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=974466.6666666666, ans=0.2 2023-12-23 05:28:52,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2023-12-23 05:28:58,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=974466.6666666666, ans=0.125 2023-12-23 05:29:01,489 INFO [train.py:886] (3/4) Epoch 31, batch 3200, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4945494.97 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:29:03,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=974533.3333333334, ans=0.125 2023-12-23 05:29:03,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-12-23 05:29:18,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.73 vs. limit=10.0 2023-12-23 05:29:19,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=974600.0, ans=0.2 2023-12-23 05:29:27,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=974666.6666666666, ans=0.07 2023-12-23 05:29:42,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=974800.0, ans=0.125 2023-12-23 05:29:48,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=974800.0, ans=0.125 2023-12-23 05:29:52,906 INFO [train.py:886] (3/4) Epoch 31, batch 3250, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01245, audio_tagging_loss=0.01245, over 4947830.17 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:30:02,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2023-12-23 05:30:02,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=974866.6666666666, ans=0.125 2023-12-23 05:30:17,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=975000.0, ans=0.125 2023-12-23 05:30:19,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=975000.0, ans=0.0 2023-12-23 05:30:19,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=975000.0, ans=0.0 2023-12-23 05:30:30,252 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.264e+01 3.408e+01 3.519e+01 4.216e+01, threshold=6.815e+01, percent-clipped=0.0 2023-12-23 05:30:37,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-12-23 05:30:45,249 INFO [train.py:886] (3/4) Epoch 31, batch 3300, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4952215.80 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:30:57,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=975266.6666666666, ans=0.2 2023-12-23 05:30:58,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2023-12-23 05:31:02,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975266.6666666666, ans=0.1 2023-12-23 05:31:25,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-12-23 05:31:35,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975466.6666666666, ans=0.1 2023-12-23 05:31:37,492 INFO [train.py:886] (3/4) Epoch 31, batch 3350, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4951329.15 frames. ], batch size: 99, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:31:41,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=975533.3333333334, ans=0.0 2023-12-23 05:31:43,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=975533.3333333334, ans=0.2 2023-12-23 05:31:55,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=975600.0, ans=0.2 2023-12-23 05:32:03,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=975666.6666666666, ans=0.0 2023-12-23 05:32:07,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=975733.3333333334, ans=0.0 2023-12-23 05:32:14,153 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.277e+01 3.424e+01 3.609e+01 4.697e+01, threshold=6.848e+01, percent-clipped=0.0 2023-12-23 05:32:18,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=975800.0, ans=0.2 2023-12-23 05:32:24,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=975800.0, ans=0.1 2023-12-23 05:32:28,536 INFO [train.py:886] (3/4) Epoch 31, batch 3400, loss[loss=0.014, audio_tagging_loss=0.014, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4950613.93 frames. ], batch size: 100, lr: 3.47e-03, grad_scale: 32.0 2023-12-23 05:32:49,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.33 vs. limit=22.5 2023-12-23 05:33:00,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=976066.6666666666, ans=0.125 2023-12-23 05:33:06,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2023-12-23 05:33:07,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=976066.6666666666, ans=10.0 2023-12-23 05:33:21,853 INFO [train.py:886] (3/4) Epoch 31, batch 3450, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4948658.61 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:33:24,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=976200.0, ans=0.125 2023-12-23 05:33:26,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=976200.0, ans=0.125 2023-12-23 05:33:42,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=976333.3333333334, ans=0.0 2023-12-23 05:33:45,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=976333.3333333334, ans=0.0 2023-12-23 05:33:54,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-12-23 05:33:58,069 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.343e+01 3.465e+01 3.665e+01 4.176e+01, threshold=6.930e+01, percent-clipped=0.0 2023-12-23 05:34:13,703 INFO [train.py:886] (3/4) Epoch 31, batch 3500, loss[loss=0.009054, audio_tagging_loss=0.009054, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4947913.43 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:34:24,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=976600.0, ans=0.125 2023-12-23 05:35:05,384 INFO [train.py:886] (3/4) Epoch 31, batch 3550, loss[loss=0.01414, audio_tagging_loss=0.01414, over 24938.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4947662.67 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:35:06,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=976866.6666666666, ans=0.1 2023-12-23 05:35:41,267 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.791e+01 3.288e+01 3.473e+01 3.624e+01 4.174e+01, threshold=6.946e+01, percent-clipped=0.0 2023-12-23 05:35:53,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=977133.3333333334, ans=0.5 2023-12-23 05:35:57,794 INFO [train.py:886] (3/4) Epoch 31, batch 3600, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4953126.83 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:36:08,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=12.0 2023-12-23 05:36:15,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=977266.6666666666, ans=0.125 2023-12-23 05:36:20,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2023-12-23 05:36:30,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=977400.0, ans=0.0 2023-12-23 05:36:50,212 INFO [train.py:886] (3/4) Epoch 31, batch 3650, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4949831.29 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:37:04,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=977600.0, ans=0.0 2023-12-23 05:37:15,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=977666.6666666666, ans=0.125 2023-12-23 05:37:25,682 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.220e+01 3.394e+01 3.585e+01 4.042e+01, threshold=6.789e+01, percent-clipped=0.0 2023-12-23 05:37:41,080 INFO [train.py:886] (3/4) Epoch 31, batch 3700, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4949312.13 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:37:42,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=977866.6666666666, ans=0.125 2023-12-23 05:38:08,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=978000.0, ans=0.2 2023-12-23 05:38:13,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978066.6666666666, ans=0.1 2023-12-23 05:38:30,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-12-23 05:38:33,225 INFO [train.py:886] (3/4) Epoch 31, batch 3750, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4951568.54 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:38:33,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=978200.0, ans=0.0 2023-12-23 05:38:34,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978200.0, ans=0.1 2023-12-23 05:38:35,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=978200.0, ans=0.0 2023-12-23 05:38:37,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=978200.0, ans=0.0 2023-12-23 05:38:38,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=978200.0, ans=0.125 2023-12-23 05:38:40,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=978200.0, ans=0.0 2023-12-23 05:38:44,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=978266.6666666666, ans=0.125 2023-12-23 05:39:06,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=978400.0, ans=0.125 2023-12-23 05:39:09,092 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.376e+01 3.545e+01 3.717e+01 4.224e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 05:39:10,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978400.0, ans=0.1 2023-12-23 05:39:17,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=978466.6666666666, ans=0.0 2023-12-23 05:39:24,134 INFO [train.py:886] (3/4) Epoch 31, batch 3800, loss[loss=0.01307, audio_tagging_loss=0.01307, over 23997.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4947951.17 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:39:24,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=978533.3333333334, ans=0.0 2023-12-23 05:39:30,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=978533.3333333334, ans=0.0 2023-12-23 05:39:45,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=978666.6666666666, ans=0.125 2023-12-23 05:39:49,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=978666.6666666666, ans=0.07 2023-12-23 05:39:59,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=978733.3333333334, ans=0.125 2023-12-23 05:40:01,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=978733.3333333334, ans=0.1 2023-12-23 05:40:04,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=978800.0, ans=0.0 2023-12-23 05:40:04,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=978800.0, ans=0.1 2023-12-23 05:40:05,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=978800.0, ans=0.025 2023-12-23 05:40:15,971 INFO [train.py:886] (3/4) Epoch 31, batch 3850, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4943751.44 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:40:21,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=978866.6666666666, ans=0.0 2023-12-23 05:40:24,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=978933.3333333334, ans=0.07 2023-12-23 05:40:28,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=978933.3333333334, ans=0.2 2023-12-23 05:40:32,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2023-12-23 05:40:42,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=979000.0, ans=0.125 2023-12-23 05:40:51,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979066.6666666666, ans=0.1 2023-12-23 05:40:51,986 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.321e+01 3.429e+01 3.579e+01 4.062e+01, threshold=6.857e+01, percent-clipped=0.0 2023-12-23 05:40:59,479 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:41:01,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=979133.3333333334, ans=0.0 2023-12-23 05:41:07,584 INFO [train.py:886] (3/4) Epoch 31, batch 3900, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4949302.96 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:41:17,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-23 05:41:21,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=979266.6666666666, ans=0.0 2023-12-23 05:41:24,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=979266.6666666666, ans=0.125 2023-12-23 05:41:29,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=979333.3333333334, ans=0.1 2023-12-23 05:41:49,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2023-12-23 05:41:54,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=979466.6666666666, ans=0.95 2023-12-23 05:41:54,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.94 vs. limit=22.5 2023-12-23 05:41:56,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=979466.6666666666, ans=0.125 2023-12-23 05:41:58,203 INFO [train.py:886] (3/4) Epoch 31, batch 3950, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4951680.87 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:42:01,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=979533.3333333334, ans=0.0 2023-12-23 05:42:06,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=979533.3333333334, ans=0.125 2023-12-23 05:42:33,389 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.895e+01 3.318e+01 3.420e+01 3.631e+01 6.052e+01, threshold=6.840e+01, percent-clipped=0.0 2023-12-23 05:42:36,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=979733.3333333334, ans=0.1 2023-12-23 05:42:38,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=979800.0, ans=0.1 2023-12-23 05:42:50,390 INFO [train.py:886] (3/4) Epoch 31, batch 4000, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4955253.67 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:43:08,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=980000.0, ans=0.0 2023-12-23 05:43:14,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=980000.0, ans=0.125 2023-12-23 05:43:15,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=980000.0, ans=0.0 2023-12-23 05:43:29,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2023-12-23 05:43:34,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=22.5 2023-12-23 05:43:38,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2023-12-23 05:43:40,167 INFO [train.py:886] (3/4) Epoch 31, batch 4050, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4953063.02 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:43:44,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=980200.0, ans=0.125 2023-12-23 05:43:45,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=980200.0, ans=0.5 2023-12-23 05:44:00,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=980333.3333333334, ans=0.2 2023-12-23 05:44:06,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=980333.3333333334, ans=0.0 2023-12-23 05:44:15,585 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.376e+01 3.511e+01 3.717e+01 4.360e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 05:44:31,104 INFO [train.py:886] (3/4) Epoch 31, batch 4100, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4949681.35 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:44:34,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=980533.3333333334, ans=0.125 2023-12-23 05:44:37,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=980533.3333333334, ans=0.125 2023-12-23 05:44:37,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=980533.3333333334, ans=0.2 2023-12-23 05:44:58,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980666.6666666666, ans=0.1 2023-12-23 05:44:58,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=980666.6666666666, ans=0.125 2023-12-23 05:45:23,695 INFO [train.py:886] (3/4) Epoch 31, batch 4150, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01265, audio_tagging_loss=0.01265, over 4951429.68 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:45:39,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=980933.3333333334, ans=0.125 2023-12-23 05:45:39,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=980933.3333333334, ans=0.125 2023-12-23 05:45:43,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-12-23 05:45:44,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=981000.0, ans=0.07 2023-12-23 05:45:59,586 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+01 3.259e+01 3.391e+01 3.549e+01 4.171e+01, threshold=6.781e+01, percent-clipped=0.0 2023-12-23 05:46:06,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2023-12-23 05:46:13,796 INFO [train.py:886] (3/4) Epoch 31, batch 4200, loss[loss=0.01376, audio_tagging_loss=0.01376, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4956256.67 frames. ], batch size: 100, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:46:17,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=981200.0, ans=0.125 2023-12-23 05:46:18,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=981200.0, ans=0.025 2023-12-23 05:46:27,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=981266.6666666666, ans=0.09899494936611666 2023-12-23 05:46:29,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=981266.6666666666, ans=0.125 2023-12-23 05:46:30,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=981266.6666666666, ans=0.0 2023-12-23 05:46:33,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=981266.6666666666, ans=0.0 2023-12-23 05:46:41,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=981333.3333333334, ans=0.125 2023-12-23 05:46:43,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=981333.3333333334, ans=0.1 2023-12-23 05:46:52,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2023-12-23 05:47:01,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-12-23 05:47:06,145 INFO [train.py:886] (3/4) Epoch 31, batch 4250, loss[loss=0.01391, audio_tagging_loss=0.01391, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4958445.92 frames. ], batch size: 99, lr: 3.46e-03, grad_scale: 32.0 2023-12-23 05:47:10,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=981533.3333333334, ans=0.125 2023-12-23 05:47:12,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=981533.3333333334, ans=0.0 2023-12-23 05:47:39,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:47:41,475 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.892e+01 3.291e+01 3.423e+01 3.545e+01 3.863e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 05:47:47,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981800.0, ans=0.1 2023-12-23 05:47:55,767 INFO [train.py:886] (3/4) Epoch 31, batch 4300, loss[loss=0.01131, audio_tagging_loss=0.01131, over 24022.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4961730.99 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:27,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=982066.6666666666, ans=0.125 2023-12-23 05:48:42,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=982133.3333333334, ans=0.5 2023-12-23 05:48:48,963 INFO [train.py:886] (3/4) Epoch 31, batch 4350, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4963240.23 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:48:51,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=982200.0, ans=0.125 2023-12-23 05:49:24,696 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.354e+01 3.488e+01 3.602e+01 4.133e+01, threshold=6.977e+01, percent-clipped=0.0 2023-12-23 05:49:36,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=982466.6666666666, ans=0.0 2023-12-23 05:49:40,822 INFO [train.py:886] (3/4) Epoch 31, batch 4400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4955516.94 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:49:45,472 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:49:47,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=982533.3333333334, ans=0.125 2023-12-23 05:50:06,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2023-12-23 05:50:08,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=982666.6666666666, ans=0.125 2023-12-23 05:50:10,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=982733.3333333334, ans=0.125 2023-12-23 05:50:30,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=982800.0, ans=0.1 2023-12-23 05:50:31,894 INFO [train.py:886] (3/4) Epoch 31, batch 4450, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01259, audio_tagging_loss=0.01259, over 4953571.22 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:50:34,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=982866.6666666666, ans=0.0 2023-12-23 05:50:42,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=982933.3333333334, ans=0.2 2023-12-23 05:50:45,449 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:51:07,905 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+01 3.353e+01 3.451e+01 3.641e+01 4.281e+01, threshold=6.902e+01, percent-clipped=0.0 2023-12-23 05:51:08,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-12-23 05:51:15,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2023-12-23 05:51:25,191 INFO [train.py:886] (3/4) Epoch 31, batch 4500, loss[loss=0.01511, audio_tagging_loss=0.01511, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4951586.44 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 32.0 2023-12-23 05:51:31,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=983200.0, ans=0.125 2023-12-23 05:51:43,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=983266.6666666666, ans=0.125 2023-12-23 05:51:43,249 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:51:47,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=983333.3333333334, ans=0.125 2023-12-23 05:52:16,181 INFO [train.py:886] (3/4) Epoch 31, batch 4550, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4955679.25 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:52:52,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=983733.3333333334, ans=0.125 2023-12-23 05:52:53,976 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.975e+01 3.298e+01 3.434e+01 3.578e+01 4.234e+01, threshold=6.868e+01, percent-clipped=0.0 2023-12-23 05:52:59,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=983800.0, ans=0.125 2023-12-23 05:53:03,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=983800.0, ans=15.0 2023-12-23 05:53:09,079 INFO [train.py:886] (3/4) Epoch 31, batch 4600, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4962925.09 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:53:12,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=983866.6666666666, ans=0.0 2023-12-23 05:53:15,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=983866.6666666666, ans=0.0 2023-12-23 05:53:25,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=983933.3333333334, ans=0.0 2023-12-23 05:53:31,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984000.0, ans=0.1 2023-12-23 05:53:32,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=984000.0, ans=0.0 2023-12-23 05:53:34,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2023-12-23 05:54:01,446 INFO [train.py:886] (3/4) Epoch 31, batch 4650, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4960467.19 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:54:02,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=984200.0, ans=0.125 2023-12-23 05:54:26,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=984333.3333333334, ans=0.125 2023-12-23 05:54:34,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=984400.0, ans=0.125 2023-12-23 05:54:36,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=984400.0, ans=0.125 2023-12-23 05:54:36,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=984400.0, ans=0.125 2023-12-23 05:54:37,329 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.324e+01 3.438e+01 3.563e+01 4.326e+01, threshold=6.876e+01, percent-clipped=0.0 2023-12-23 05:54:52,090 INFO [train.py:886] (3/4) Epoch 31, batch 4700, loss[loss=0.01177, audio_tagging_loss=0.01177, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4957231.49 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:55:10,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.78 vs. limit=15.0 2023-12-23 05:55:27,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-12-23 05:55:32,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=984800.0, ans=0.125 2023-12-23 05:55:39,296 INFO [train.py:886] (3/4) Epoch 31, batch 4750, loss[loss=0.0147, audio_tagging_loss=0.0147, over 24750.00 frames. ], tot_loss[loss=0.01251, audio_tagging_loss=0.01251, over 4948936.28 frames. ], batch size: 99, lr: 3.45e-03, grad_scale: 64.0 2023-12-23 05:55:45,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-12-23 05:55:48,672 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 05:56:15,678 INFO [train.py:886] (3/4) Epoch 32, batch 0, loss[loss=0.02608, audio_tagging_loss=0.02608, over 24045.00 frames. ], tot_loss[loss=0.02608, audio_tagging_loss=0.02608, over 24045.00 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:56:15,678 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 05:56:36,813 INFO [train.py:917] (3/4) Epoch 32, validation: loss=0.03288, audio_tagging_loss=0.03288, over 3737520.00 frames. 2023-12-23 05:56:36,814 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 05:56:42,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=984973.3333333334, ans=0.125 2023-12-23 05:56:46,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=985040.0, ans=0.0 2023-12-23 05:56:57,224 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.370e+01 3.559e+01 3.900e+01 9.561e+01, threshold=7.118e+01, percent-clipped=7.0 2023-12-23 05:57:00,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=985106.6666666666, ans=0.0 2023-12-23 05:57:01,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=985106.6666666666, ans=0.1 2023-12-23 05:57:02,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=985106.6666666666, ans=0.125 2023-12-23 05:57:02,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=985106.6666666666, ans=0.0 2023-12-23 05:57:03,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=985106.6666666666, ans=0.125 2023-12-23 05:57:06,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=985173.3333333334, ans=0.0 2023-12-23 05:57:12,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=985173.3333333334, ans=0.125 2023-12-23 05:57:22,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=985240.0, ans=0.2 2023-12-23 05:57:26,996 INFO [train.py:886] (3/4) Epoch 32, batch 50, loss[loss=0.01555, audio_tagging_loss=0.01555, over 25000.00 frames. ], tot_loss[loss=0.01966, audio_tagging_loss=0.01966, over 1122022.79 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:57:34,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=985306.6666666666, ans=0.5 2023-12-23 05:57:41,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=985373.3333333334, ans=0.1 2023-12-23 05:57:52,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-12-23 05:57:58,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985506.6666666666, ans=0.1 2023-12-23 05:58:18,049 INFO [train.py:886] (3/4) Epoch 32, batch 100, loss[loss=0.01693, audio_tagging_loss=0.01693, over 24750.00 frames. ], tot_loss[loss=0.01715, audio_tagging_loss=0.01715, over 1969134.40 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:58:20,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2023-12-23 05:58:36,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=985706.6666666666, ans=0.125 2023-12-23 05:58:38,570 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.723e+01 3.980e+01 4.376e+01 5.362e+01, threshold=7.961e+01, percent-clipped=0.0 2023-12-23 05:58:39,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=985773.3333333334, ans=0.125 2023-12-23 05:58:50,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=985840.0, ans=0.125 2023-12-23 05:59:08,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=985906.6666666666, ans=0.125 2023-12-23 05:59:08,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985906.6666666666, ans=0.1 2023-12-23 05:59:09,719 INFO [train.py:886] (3/4) Epoch 32, batch 150, loss[loss=0.0105, audio_tagging_loss=0.0105, over 25000.00 frames. ], tot_loss[loss=0.01553, audio_tagging_loss=0.01553, over 2636515.17 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 05:59:22,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.15 vs. limit=10.0 2023-12-23 05:59:24,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=986040.0, ans=0.0 2023-12-23 05:59:36,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=986106.6666666666, ans=0.0 2023-12-23 05:59:40,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-12-23 06:00:01,206 INFO [train.py:886] (3/4) Epoch 32, batch 200, loss[loss=0.01409, audio_tagging_loss=0.01409, over 25000.00 frames. ], tot_loss[loss=0.01456, audio_tagging_loss=0.01456, over 3151431.98 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:00:02,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=986306.6666666666, ans=0.0 2023-12-23 06:00:15,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=986373.3333333334, ans=0.2 2023-12-23 06:00:16,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=986373.3333333334, ans=0.125 2023-12-23 06:00:19,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=986373.3333333334, ans=0.125 2023-12-23 06:00:21,576 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.980e+01 3.390e+01 3.500e+01 3.693e+01 4.218e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 06:00:26,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-12-23 06:00:36,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=986506.6666666666, ans=0.125 2023-12-23 06:00:39,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=986506.6666666666, ans=0.2 2023-12-23 06:00:40,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=986573.3333333334, ans=0.125 2023-12-23 06:00:50,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=986640.0, ans=0.125 2023-12-23 06:00:51,464 INFO [train.py:886] (3/4) Epoch 32, batch 250, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01405, audio_tagging_loss=0.01405, over 3551334.60 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:01:19,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=986773.3333333334, ans=0.0 2023-12-23 06:01:28,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=986840.0, ans=0.125 2023-12-23 06:01:34,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-12-23 06:01:39,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=986906.6666666666, ans=0.0 2023-12-23 06:01:40,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=986906.6666666666, ans=0.125 2023-12-23 06:01:45,004 INFO [train.py:886] (3/4) Epoch 32, batch 300, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01378, audio_tagging_loss=0.01378, over 3858478.66 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:02:06,007 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.926e+01 3.374e+01 3.482e+01 3.656e+01 5.710e+01, threshold=6.964e+01, percent-clipped=0.0 2023-12-23 06:02:15,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=987173.3333333334, ans=0.1 2023-12-23 06:02:16,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-12-23 06:02:23,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-12-23 06:02:37,253 INFO [train.py:886] (3/4) Epoch 32, batch 350, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01354, audio_tagging_loss=0.01354, over 4100263.53 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:03:12,434 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:03:20,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=987573.3333333334, ans=0.0 2023-12-23 06:03:23,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=987573.3333333334, ans=0.0 2023-12-23 06:03:25,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=987573.3333333334, ans=0.1 2023-12-23 06:03:28,781 INFO [train.py:886] (3/4) Epoch 32, batch 400, loss[loss=0.01175, audio_tagging_loss=0.01175, over 25000.00 frames. ], tot_loss[loss=0.01316, audio_tagging_loss=0.01316, over 4288642.66 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:03:30,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=987640.0, ans=0.0 2023-12-23 06:03:47,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.84 vs. limit=22.5 2023-12-23 06:03:49,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.966e+01 3.292e+01 3.425e+01 3.603e+01 4.421e+01, threshold=6.851e+01, percent-clipped=0.0 2023-12-23 06:04:08,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=987840.0, ans=0.07 2023-12-23 06:04:18,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2023-12-23 06:04:20,506 INFO [train.py:886] (3/4) Epoch 32, batch 450, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4437747.19 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:04:25,463 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:04:26,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=987973.3333333334, ans=15.0 2023-12-23 06:04:35,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2023-12-23 06:04:52,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988173.3333333334, ans=0.1 2023-12-23 06:05:13,749 INFO [train.py:886] (3/4) Epoch 32, batch 500, loss[loss=0.01331, audio_tagging_loss=0.01331, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4556027.94 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:05:27,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=988373.3333333334, ans=0.125 2023-12-23 06:05:34,071 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.279e+01 3.431e+01 3.556e+01 4.457e+01, threshold=6.862e+01, percent-clipped=0.0 2023-12-23 06:05:46,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988506.6666666666, ans=0.1 2023-12-23 06:05:49,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=988506.6666666666, ans=0.125 2023-12-23 06:05:51,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-12-23 06:06:04,438 INFO [train.py:886] (3/4) Epoch 32, batch 550, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4645048.60 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:06:09,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=988640.0, ans=0.125 2023-12-23 06:06:20,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=988706.6666666666, ans=0.125 2023-12-23 06:06:26,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-12-23 06:06:51,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=988906.6666666666, ans=0.125 2023-12-23 06:06:53,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=988906.6666666666, ans=0.0 2023-12-23 06:06:56,705 INFO [train.py:886] (3/4) Epoch 32, batch 600, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4713777.08 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:07:01,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988973.3333333334, ans=0.1 2023-12-23 06:07:13,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-12-23 06:07:16,465 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.995e+01 3.360e+01 3.500e+01 3.662e+01 4.640e+01, threshold=6.999e+01, percent-clipped=0.0 2023-12-23 06:07:31,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=989173.3333333334, ans=0.125 2023-12-23 06:07:32,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2023-12-23 06:07:36,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=989240.0, ans=0.125 2023-12-23 06:07:42,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=989240.0, ans=0.0 2023-12-23 06:07:47,457 INFO [train.py:886] (3/4) Epoch 32, batch 650, loss[loss=0.01609, audio_tagging_loss=0.01609, over 24750.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4763772.92 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:07:47,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=989306.6666666666, ans=0.125 2023-12-23 06:07:50,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2023-12-23 06:08:22,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=989506.6666666666, ans=0.125 2023-12-23 06:08:25,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=989506.6666666666, ans=0.0 2023-12-23 06:08:30,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=989573.3333333334, ans=0.0 2023-12-23 06:08:32,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=989573.3333333334, ans=10.0 2023-12-23 06:08:38,946 INFO [train.py:886] (3/4) Epoch 32, batch 700, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.0127, audio_tagging_loss=0.0127, over 4808628.15 frames. ], batch size: 100, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:08:50,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-12-23 06:08:53,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=989706.6666666666, ans=0.125 2023-12-23 06:08:56,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=989706.6666666666, ans=22.5 2023-12-23 06:09:00,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.361e+01 3.483e+01 3.638e+01 4.133e+01, threshold=6.965e+01, percent-clipped=0.0 2023-12-23 06:09:05,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=989773.3333333334, ans=0.09899494936611666 2023-12-23 06:09:10,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=989840.0, ans=0.1 2023-12-23 06:09:17,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=12.0 2023-12-23 06:09:28,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=989906.6666666666, ans=0.0 2023-12-23 06:09:29,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=989906.6666666666, ans=0.05 2023-12-23 06:09:32,134 INFO [train.py:886] (3/4) Epoch 32, batch 750, loss[loss=0.01276, audio_tagging_loss=0.01276, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4837506.42 frames. ], batch size: 99, lr: 3.39e-03, grad_scale: 32.0 2023-12-23 06:10:06,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990173.3333333334, ans=0.1 2023-12-23 06:10:14,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=990240.0, ans=0.1 2023-12-23 06:10:21,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=990240.0, ans=0.02 2023-12-23 06:10:23,087 INFO [train.py:886] (3/4) Epoch 32, batch 800, loss[loss=0.01421, audio_tagging_loss=0.01421, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4863601.09 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:10:27,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=990306.6666666666, ans=0.1 2023-12-23 06:10:39,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=990373.3333333334, ans=0.0 2023-12-23 06:10:41,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.67 vs. limit=15.0 2023-12-23 06:10:44,493 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.016e+01 3.268e+01 3.423e+01 3.577e+01 3.951e+01, threshold=6.846e+01, percent-clipped=0.0 2023-12-23 06:10:46,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2023-12-23 06:10:50,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-23 06:11:00,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=990506.6666666666, ans=10.0 2023-12-23 06:11:01,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-12-23 06:11:05,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=990573.3333333334, ans=0.125 2023-12-23 06:11:16,034 INFO [train.py:886] (3/4) Epoch 32, batch 850, loss[loss=0.009355, audio_tagging_loss=0.009355, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4880462.77 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:11:27,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=990706.6666666666, ans=0.125 2023-12-23 06:11:38,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-12-23 06:11:43,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=990773.3333333334, ans=0.0 2023-12-23 06:11:45,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-23 06:11:46,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2023-12-23 06:12:01,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=990906.6666666666, ans=0.0 2023-12-23 06:12:07,821 INFO [train.py:886] (3/4) Epoch 32, batch 900, loss[loss=0.01258, audio_tagging_loss=0.01258, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4895853.25 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:12:10,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=990973.3333333334, ans=0.1 2023-12-23 06:12:28,209 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.920e+01 3.389e+01 3.537e+01 3.671e+01 4.365e+01, threshold=7.073e+01, percent-clipped=0.0 2023-12-23 06:12:29,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=991106.6666666666, ans=0.2 2023-12-23 06:12:37,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991173.3333333334, ans=0.1 2023-12-23 06:12:42,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=991173.3333333334, ans=0.125 2023-12-23 06:12:58,814 INFO [train.py:886] (3/4) Epoch 32, batch 950, loss[loss=0.01102, audio_tagging_loss=0.01102, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4901736.16 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:13:10,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=991373.3333333334, ans=0.025 2023-12-23 06:13:19,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=991373.3333333334, ans=0.125 2023-12-23 06:13:22,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=991440.0, ans=0.125 2023-12-23 06:13:23,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=991440.0, ans=0.1 2023-12-23 06:13:34,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=991506.6666666666, ans=0.2 2023-12-23 06:13:38,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991506.6666666666, ans=0.1 2023-12-23 06:13:51,868 INFO [train.py:886] (3/4) Epoch 32, batch 1000, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4908288.00 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:13:54,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 06:13:56,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991640.0, ans=0.1 2023-12-23 06:13:58,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=991640.0, ans=0.125 2023-12-23 06:13:59,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=991640.0, ans=0.125 2023-12-23 06:14:11,542 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.978e+01 3.269e+01 3.394e+01 3.560e+01 3.959e+01, threshold=6.787e+01, percent-clipped=0.0 2023-12-23 06:14:14,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=991773.3333333334, ans=0.0 2023-12-23 06:14:16,382 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:14:34,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=991906.6666666666, ans=0.125 2023-12-23 06:14:36,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-12-23 06:14:42,956 INFO [train.py:886] (3/4) Epoch 32, batch 1050, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4913372.47 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:14:43,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-23 06:14:45,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-23 06:14:49,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=991973.3333333334, ans=0.0 2023-12-23 06:14:59,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.27 vs. limit=10.0 2023-12-23 06:15:01,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992040.0, ans=0.1 2023-12-23 06:15:04,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=992106.6666666666, ans=0.125 2023-12-23 06:15:11,157 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:15:16,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=992173.3333333334, ans=0.125 2023-12-23 06:15:33,860 INFO [train.py:886] (3/4) Epoch 32, batch 1100, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4923343.93 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:15:54,837 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.843e+01 3.284e+01 3.426e+01 3.635e+01 4.027e+01, threshold=6.852e+01, percent-clipped=0.0 2023-12-23 06:16:03,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-23 06:16:21,297 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.785e-03 2023-12-23 06:16:26,210 INFO [train.py:886] (3/4) Epoch 32, batch 1150, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4933486.41 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:16:34,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=992640.0, ans=0.1 2023-12-23 06:16:39,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=992706.6666666666, ans=0.0 2023-12-23 06:16:41,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=992706.6666666666, ans=0.025 2023-12-23 06:16:55,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=992840.0, ans=0.0 2023-12-23 06:17:09,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=992906.6666666666, ans=0.125 2023-12-23 06:17:17,328 INFO [train.py:886] (3/4) Epoch 32, batch 1200, loss[loss=0.01006, audio_tagging_loss=0.01006, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943435.45 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:17:21,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=992973.3333333334, ans=0.0 2023-12-23 06:17:23,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992973.3333333334, ans=0.1 2023-12-23 06:17:39,160 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.986e+01 3.358e+01 3.522e+01 3.691e+01 4.259e+01, threshold=7.044e+01, percent-clipped=0.0 2023-12-23 06:17:46,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-23 06:18:07,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=993240.0, ans=0.125 2023-12-23 06:18:10,289 INFO [train.py:886] (3/4) Epoch 32, batch 1250, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4944581.76 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:18:13,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-12-23 06:18:25,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=993373.3333333334, ans=0.125 2023-12-23 06:18:35,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2023-12-23 06:18:37,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.42 vs. limit=5.0 2023-12-23 06:18:43,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=993506.6666666666, ans=0.1 2023-12-23 06:18:49,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=22.5 2023-12-23 06:18:58,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=993573.3333333334, ans=0.125 2023-12-23 06:19:01,457 INFO [train.py:886] (3/4) Epoch 32, batch 1300, loss[loss=0.01261, audio_tagging_loss=0.01261, over 23928.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 4935562.87 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:19:12,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=993706.6666666666, ans=10.0 2023-12-23 06:19:15,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=993706.6666666666, ans=0.125 2023-12-23 06:19:16,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=993706.6666666666, ans=0.2 2023-12-23 06:19:16,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=993706.6666666666, ans=0.2 2023-12-23 06:19:22,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=993773.3333333334, ans=0.1 2023-12-23 06:19:22,691 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.042e+01 3.376e+01 3.531e+01 3.672e+01 4.244e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:19:28,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=993773.3333333334, ans=0.0 2023-12-23 06:19:34,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=993840.0, ans=0.125 2023-12-23 06:19:53,910 INFO [train.py:886] (3/4) Epoch 32, batch 1350, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01258, audio_tagging_loss=0.01258, over 4935762.33 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:19:57,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2023-12-23 06:20:19,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=994106.6666666666, ans=0.125 2023-12-23 06:20:25,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=994173.3333333334, ans=0.0 2023-12-23 06:20:27,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=994173.3333333334, ans=0.125 2023-12-23 06:20:30,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=994173.3333333334, ans=0.125 2023-12-23 06:20:45,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=994306.6666666666, ans=0.125 2023-12-23 06:20:46,240 INFO [train.py:886] (3/4) Epoch 32, batch 1400, loss[loss=0.01116, audio_tagging_loss=0.01116, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4941656.04 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:20:48,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=994306.6666666666, ans=0.1 2023-12-23 06:20:51,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=994306.6666666666, ans=0.125 2023-12-23 06:21:01,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=994373.3333333334, ans=0.0 2023-12-23 06:21:06,614 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.912e+01 3.298e+01 3.472e+01 3.566e+01 4.099e+01, threshold=6.943e+01, percent-clipped=0.0 2023-12-23 06:21:11,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=994440.0, ans=0.125 2023-12-23 06:21:15,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=994440.0, ans=0.05 2023-12-23 06:21:16,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=994506.6666666666, ans=0.0 2023-12-23 06:21:29,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=994573.3333333334, ans=0.05 2023-12-23 06:21:38,016 INFO [train.py:886] (3/4) Epoch 32, batch 1450, loss[loss=0.01208, audio_tagging_loss=0.01208, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4948217.91 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:21:56,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=994706.6666666666, ans=0.125 2023-12-23 06:21:58,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=994773.3333333334, ans=0.125 2023-12-23 06:22:04,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=994773.3333333334, ans=0.1 2023-12-23 06:22:18,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=994906.6666666666, ans=0.1 2023-12-23 06:22:22,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2023-12-23 06:22:27,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2023-12-23 06:22:27,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=994906.6666666666, ans=0.0 2023-12-23 06:22:30,195 INFO [train.py:886] (3/4) Epoch 32, batch 1500, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4957664.85 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:22:40,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=995040.0, ans=0.05 2023-12-23 06:22:40,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=995040.0, ans=0.0 2023-12-23 06:22:46,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=995040.0, ans=0.0 2023-12-23 06:22:47,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2023-12-23 06:22:50,503 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.293e+01 3.421e+01 3.550e+01 3.928e+01, threshold=6.843e+01, percent-clipped=0.0 2023-12-23 06:22:57,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=995106.6666666666, ans=0.125 2023-12-23 06:22:58,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=995106.6666666666, ans=0.125 2023-12-23 06:23:13,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995240.0, ans=0.0 2023-12-23 06:23:21,272 INFO [train.py:886] (3/4) Epoch 32, batch 1550, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4955810.93 frames. ], batch size: 99, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:23:26,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995306.6666666666, ans=0.1 2023-12-23 06:23:50,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=995440.0, ans=0.09899494936611666 2023-12-23 06:24:03,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=995573.3333333334, ans=0.0 2023-12-23 06:24:09,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2023-12-23 06:24:10,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=995573.3333333334, ans=0.0 2023-12-23 06:24:10,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=995573.3333333334, ans=0.125 2023-12-23 06:24:13,332 INFO [train.py:886] (3/4) Epoch 32, batch 1600, loss[loss=0.01596, audio_tagging_loss=0.01596, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4952420.28 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:24:30,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=995706.6666666666, ans=0.125 2023-12-23 06:24:34,477 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.084e+01 3.339e+01 3.484e+01 3.655e+01 4.537e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 06:24:41,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=995773.3333333334, ans=0.125 2023-12-23 06:24:49,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=995840.0, ans=0.125 2023-12-23 06:25:04,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=995973.3333333334, ans=0.125 2023-12-23 06:25:04,875 INFO [train.py:886] (3/4) Epoch 32, batch 1650, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01264, audio_tagging_loss=0.01264, over 4947149.82 frames. ], batch size: 100, lr: 3.38e-03, grad_scale: 32.0 2023-12-23 06:25:38,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=996173.3333333334, ans=0.125 2023-12-23 06:25:46,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=996240.0, ans=0.125 2023-12-23 06:25:52,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-12-23 06:25:56,152 INFO [train.py:886] (3/4) Epoch 32, batch 1700, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4948058.05 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:25:58,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=996306.6666666666, ans=0.2 2023-12-23 06:26:09,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=996373.3333333334, ans=0.0 2023-12-23 06:26:09,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-12-23 06:26:09,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-23 06:26:16,298 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.316e+01 3.484e+01 3.622e+01 4.382e+01, threshold=6.969e+01, percent-clipped=0.0 2023-12-23 06:26:21,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=996440.0, ans=0.125 2023-12-23 06:26:36,576 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:26:46,893 INFO [train.py:886] (3/4) Epoch 32, batch 1750, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4949424.23 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:26:51,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.78 vs. limit=10.0 2023-12-23 06:26:54,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=996640.0, ans=0.09899494936611666 2023-12-23 06:27:01,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2023-12-23 06:27:12,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 06:27:17,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=996840.0, ans=0.0 2023-12-23 06:27:20,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=996840.0, ans=0.125 2023-12-23 06:27:21,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=996840.0, ans=0.2 2023-12-23 06:27:22,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=996840.0, ans=0.125 2023-12-23 06:27:40,116 INFO [train.py:886] (3/4) Epoch 32, batch 1800, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4953693.32 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:27:42,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=996973.3333333334, ans=0.0 2023-12-23 06:27:48,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=997040.0, ans=0.2 2023-12-23 06:27:49,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-12-23 06:27:50,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=997040.0, ans=0.0 2023-12-23 06:27:59,177 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.314e+01 3.468e+01 3.623e+01 4.214e+01, threshold=6.936e+01, percent-clipped=0.0 2023-12-23 06:27:59,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=15.0 2023-12-23 06:28:29,595 INFO [train.py:886] (3/4) Epoch 32, batch 1850, loss[loss=0.01306, audio_tagging_loss=0.01306, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4957367.12 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:29:22,522 INFO [train.py:886] (3/4) Epoch 32, batch 1900, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24750.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 4949867.55 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:29:40,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=997706.6666666666, ans=0.0 2023-12-23 06:29:42,700 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.383e+01 3.531e+01 3.681e+01 4.206e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 06:29:53,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=997840.0, ans=0.1 2023-12-23 06:30:13,311 INFO [train.py:886] (3/4) Epoch 32, batch 1950, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4952398.16 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 32.0 2023-12-23 06:30:22,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=997973.3333333334, ans=0.125 2023-12-23 06:30:36,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=998106.6666666666, ans=0.0 2023-12-23 06:30:36,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=998106.6666666666, ans=0.125 2023-12-23 06:30:51,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=998173.3333333334, ans=0.125 2023-12-23 06:30:53,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=998173.3333333334, ans=0.125 2023-12-23 06:31:01,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=998240.0, ans=0.125 2023-12-23 06:31:04,956 INFO [train.py:886] (3/4) Epoch 32, batch 2000, loss[loss=0.01021, audio_tagging_loss=0.01021, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4950822.06 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:31:08,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=998306.6666666666, ans=0.125 2023-12-23 06:31:09,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=998306.6666666666, ans=0.0 2023-12-23 06:31:17,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=998373.3333333334, ans=0.1 2023-12-23 06:31:23,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=998373.3333333334, ans=0.0 2023-12-23 06:31:26,067 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.968e+01 3.338e+01 3.486e+01 3.657e+01 4.428e+01, threshold=6.972e+01, percent-clipped=0.0 2023-12-23 06:31:47,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=998573.3333333334, ans=0.125 2023-12-23 06:31:56,443 INFO [train.py:886] (3/4) Epoch 32, batch 2050, loss[loss=0.01348, audio_tagging_loss=0.01348, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4947271.91 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:32:05,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=998706.6666666666, ans=0.125 2023-12-23 06:32:13,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=998706.6666666666, ans=0.02 2023-12-23 06:32:17,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=998773.3333333334, ans=0.125 2023-12-23 06:32:43,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-12-23 06:32:46,713 INFO [train.py:886] (3/4) Epoch 32, batch 2100, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4950668.80 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:32:53,316 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:33:08,554 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.022e+01 3.328e+01 3.487e+01 3.635e+01 4.227e+01, threshold=6.974e+01, percent-clipped=0.0 2023-12-23 06:33:09,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=999106.6666666666, ans=0.0 2023-12-23 06:33:37,216 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:33:39,879 INFO [train.py:886] (3/4) Epoch 32, batch 2150, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4945295.49 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:33:49,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2023-12-23 06:33:50,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=999373.3333333334, ans=0.0 2023-12-23 06:34:24,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=999573.3333333334, ans=0.125 2023-12-23 06:34:29,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=999573.3333333334, ans=0.125 2023-12-23 06:34:31,148 INFO [train.py:886] (3/4) Epoch 32, batch 2200, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4946600.76 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:34:46,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=999706.6666666666, ans=0.2 2023-12-23 06:34:51,672 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.381e+01 3.492e+01 3.667e+01 4.618e+01, threshold=6.983e+01, percent-clipped=0.0 2023-12-23 06:35:01,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=999840.0, ans=0.0 2023-12-23 06:35:08,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-12-23 06:35:10,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=999840.0, ans=0.0 2023-12-23 06:35:22,728 INFO [train.py:886] (3/4) Epoch 32, batch 2250, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4945445.33 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:35:39,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1000040.0, ans=0.125 2023-12-23 06:35:46,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000106.6666666666, ans=0.1 2023-12-23 06:36:15,256 INFO [train.py:886] (3/4) Epoch 32, batch 2300, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4950070.63 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:36:18,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1000306.6666666666, ans=0.0 2023-12-23 06:36:18,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1000306.6666666666, ans=0.2 2023-12-23 06:36:35,437 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.075e+01 3.334e+01 3.452e+01 3.612e+01 4.472e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 06:36:41,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1000440.0, ans=0.0 2023-12-23 06:36:48,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1000506.6666666666, ans=0.125 2023-12-23 06:36:57,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1000573.3333333334, ans=0.125 2023-12-23 06:37:06,010 INFO [train.py:886] (3/4) Epoch 32, batch 2350, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4953080.07 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:37:12,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1000640.0, ans=0.125 2023-12-23 06:37:15,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1000640.0, ans=0.0 2023-12-23 06:37:21,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1000706.6666666666, ans=0.025 2023-12-23 06:37:34,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1000773.3333333334, ans=0.07 2023-12-23 06:37:36,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1000840.0, ans=0.04949747468305833 2023-12-23 06:37:40,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1000840.0, ans=0.125 2023-12-23 06:37:58,446 INFO [train.py:886] (3/4) Epoch 32, batch 2400, loss[loss=0.01271, audio_tagging_loss=0.01271, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4955298.78 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:38:04,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1000973.3333333334, ans=0.0 2023-12-23 06:38:11,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2023-12-23 06:38:16,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1001040.0, ans=0.125 2023-12-23 06:38:19,329 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.328e+01 3.460e+01 3.635e+01 4.169e+01, threshold=6.920e+01, percent-clipped=0.0 2023-12-23 06:38:25,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2023-12-23 06:38:36,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1001173.3333333334, ans=0.0 2023-12-23 06:38:50,369 INFO [train.py:886] (3/4) Epoch 32, batch 2450, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4956903.37 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:38:52,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1001306.6666666666, ans=0.125 2023-12-23 06:39:13,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1001440.0, ans=0.0 2023-12-23 06:39:32,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2023-12-23 06:39:41,479 INFO [train.py:886] (3/4) Epoch 32, batch 2500, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4953809.73 frames. ], batch size: 99, lr: 3.37e-03, grad_scale: 64.0 2023-12-23 06:39:55,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1001706.6666666666, ans=0.0 2023-12-23 06:40:02,496 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.169e+01 3.349e+01 3.493e+01 3.683e+01 6.550e+01, threshold=6.987e+01, percent-clipped=0.0 2023-12-23 06:40:03,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1001773.3333333334, ans=0.2 2023-12-23 06:40:05,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1001773.3333333334, ans=10.0 2023-12-23 06:40:28,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-12-23 06:40:30,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1001906.6666666666, ans=0.2 2023-12-23 06:40:32,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1001973.3333333334, ans=0.2 2023-12-23 06:40:33,438 INFO [train.py:886] (3/4) Epoch 32, batch 2550, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4945425.93 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:40:43,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-12-23 06:40:59,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1002106.6666666666, ans=0.0 2023-12-23 06:41:04,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1002173.3333333334, ans=0.035 2023-12-23 06:41:25,730 INFO [train.py:886] (3/4) Epoch 32, batch 2600, loss[loss=0.01411, audio_tagging_loss=0.01411, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4946955.13 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:41:30,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1002306.6666666666, ans=0.2 2023-12-23 06:41:38,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2023-12-23 06:41:43,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1002373.3333333334, ans=0.125 2023-12-23 06:41:45,335 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.988e+01 3.350e+01 3.516e+01 3.667e+01 4.402e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 06:42:01,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1002506.6666666666, ans=0.125 2023-12-23 06:42:16,566 INFO [train.py:886] (3/4) Epoch 32, batch 2650, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4947574.92 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:42:21,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-12-23 06:42:37,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1002706.6666666666, ans=22.5 2023-12-23 06:42:40,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1002773.3333333334, ans=0.125 2023-12-23 06:43:08,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1002906.6666666666, ans=0.0 2023-12-23 06:43:10,030 INFO [train.py:886] (3/4) Epoch 32, batch 2700, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4946715.76 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:43:10,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1002973.3333333334, ans=0.0 2023-12-23 06:43:23,523 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:43:27,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1003040.0, ans=0.0 2023-12-23 06:43:27,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-23 06:43:29,820 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+01 3.300e+01 3.397e+01 3.576e+01 4.184e+01, threshold=6.794e+01, percent-clipped=0.0 2023-12-23 06:43:30,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1003106.6666666666, ans=0.125 2023-12-23 06:43:34,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1003106.6666666666, ans=0.125 2023-12-23 06:43:43,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-23 06:43:45,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1003173.3333333334, ans=0.125 2023-12-23 06:44:01,188 INFO [train.py:886] (3/4) Epoch 32, batch 2750, loss[loss=0.009374, audio_tagging_loss=0.009374, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4953363.35 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:44:02,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-12-23 06:44:09,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1003306.6666666666, ans=0.2 2023-12-23 06:44:10,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-12-23 06:44:13,016 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:44:31,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.11 vs. limit=12.0 2023-12-23 06:44:50,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2023-12-23 06:44:53,693 INFO [train.py:886] (3/4) Epoch 32, batch 2800, loss[loss=0.01289, audio_tagging_loss=0.01289, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4955646.20 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:14,665 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.316e+01 3.538e+01 3.680e+01 4.583e+01, threshold=7.076e+01, percent-clipped=0.0 2023-12-23 06:45:15,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1003773.3333333334, ans=0.125 2023-12-23 06:45:20,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1003773.3333333334, ans=0.125 2023-12-23 06:45:38,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1003906.6666666666, ans=0.125 2023-12-23 06:45:46,243 INFO [train.py:886] (3/4) Epoch 32, batch 2850, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4948016.10 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:45:55,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1004040.0, ans=0.125 2023-12-23 06:45:55,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1004040.0, ans=0.0 2023-12-23 06:46:11,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1004106.6666666666, ans=0.0 2023-12-23 06:46:18,084 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:46:23,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1004173.3333333334, ans=0.125 2023-12-23 06:46:37,740 INFO [train.py:886] (3/4) Epoch 32, batch 2900, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4952293.15 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:46:59,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1004440.0, ans=0.0 2023-12-23 06:46:59,935 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.913e+01 3.284e+01 3.453e+01 3.590e+01 5.160e+01, threshold=6.905e+01, percent-clipped=0.0 2023-12-23 06:47:05,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2023-12-23 06:47:08,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1004506.6666666666, ans=0.0 2023-12-23 06:47:11,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-12-23 06:47:12,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-12-23 06:47:18,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1004506.6666666666, ans=0.0 2023-12-23 06:47:30,608 INFO [train.py:886] (3/4) Epoch 32, batch 2950, loss[loss=0.009507, audio_tagging_loss=0.009507, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4954430.98 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:47:35,641 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:47:38,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1004640.0, ans=0.0 2023-12-23 06:47:40,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1004706.6666666666, ans=0.1 2023-12-23 06:47:45,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2023-12-23 06:47:45,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1004706.6666666666, ans=0.125 2023-12-23 06:47:47,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1004706.6666666666, ans=0.1 2023-12-23 06:47:52,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1004773.3333333334, ans=0.125 2023-12-23 06:48:06,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-12-23 06:48:20,538 INFO [train.py:886] (3/4) Epoch 32, batch 3000, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4956017.53 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:48:20,538 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 06:48:39,269 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5965, 3.6832, 3.0731, 2.9755], device='cuda:3') 2023-12-23 06:48:41,455 INFO [train.py:917] (3/4) Epoch 32, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 06:48:41,456 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 06:48:44,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1004973.3333333334, ans=0.2 2023-12-23 06:48:52,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1005040.0, ans=0.125 2023-12-23 06:48:54,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1005040.0, ans=0.0 2023-12-23 06:49:02,379 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.356e+01 3.479e+01 3.651e+01 4.265e+01, threshold=6.959e+01, percent-clipped=0.0 2023-12-23 06:49:09,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1005106.6666666666, ans=0.2 2023-12-23 06:49:11,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1005173.3333333334, ans=0.125 2023-12-23 06:49:13,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1005173.3333333334, ans=0.125 2023-12-23 06:49:33,528 INFO [train.py:886] (3/4) Epoch 32, batch 3050, loss[loss=0.01059, audio_tagging_loss=0.01059, over 21708.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4955899.23 frames. ], batch size: 107, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:49:36,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1005306.6666666666, ans=0.125 2023-12-23 06:49:43,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-23 06:49:44,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1005373.3333333334, ans=0.0 2023-12-23 06:50:00,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1005440.0, ans=0.025 2023-12-23 06:50:05,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1005506.6666666666, ans=0.125 2023-12-23 06:50:13,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-12-23 06:50:18,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005573.3333333334, ans=0.1 2023-12-23 06:50:20,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2023-12-23 06:50:24,663 INFO [train.py:886] (3/4) Epoch 32, batch 3100, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4958478.44 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:50:30,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1005640.0, ans=0.125 2023-12-23 06:50:35,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1005706.6666666666, ans=0.125 2023-12-23 06:50:40,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1005706.6666666666, ans=0.125 2023-12-23 06:50:45,084 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.353e+01 3.474e+01 3.650e+01 4.005e+01, threshold=6.948e+01, percent-clipped=0.0 2023-12-23 06:51:00,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1005840.0, ans=0.125 2023-12-23 06:51:14,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1005906.6666666666, ans=0.1 2023-12-23 06:51:16,259 INFO [train.py:886] (3/4) Epoch 32, batch 3150, loss[loss=0.01427, audio_tagging_loss=0.01427, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4956384.45 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:51:44,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1006106.6666666666, ans=0.125 2023-12-23 06:51:52,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1006173.3333333334, ans=0.1 2023-12-23 06:51:59,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1006240.0, ans=0.125 2023-12-23 06:52:03,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1006240.0, ans=0.0 2023-12-23 06:52:09,067 INFO [train.py:886] (3/4) Epoch 32, batch 3200, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4955974.20 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:52:25,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1006373.3333333334, ans=0.0 2023-12-23 06:52:28,450 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.025e+01 3.282e+01 3.455e+01 3.590e+01 4.189e+01, threshold=6.910e+01, percent-clipped=0.0 2023-12-23 06:52:33,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-12-23 06:52:51,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-12-23 06:52:52,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1006573.3333333334, ans=0.125 2023-12-23 06:52:52,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-12-23 06:52:55,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1006573.3333333334, ans=0.125 2023-12-23 06:53:00,206 INFO [train.py:886] (3/4) Epoch 32, batch 3250, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4952100.52 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:53:03,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-12-23 06:53:08,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1006640.0, ans=0.07 2023-12-23 06:53:15,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006706.6666666666, ans=0.125 2023-12-23 06:53:17,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1006706.6666666666, ans=0.125 2023-12-23 06:53:24,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1006773.3333333334, ans=0.125 2023-12-23 06:53:43,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1006906.6666666666, ans=0.1 2023-12-23 06:53:48,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1006906.6666666666, ans=10.0 2023-12-23 06:53:52,678 INFO [train.py:886] (3/4) Epoch 32, batch 3300, loss[loss=0.01363, audio_tagging_loss=0.01363, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4951946.17 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:53:59,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2023-12-23 06:54:06,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1007040.0, ans=0.2 2023-12-23 06:54:13,383 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.804e+01 3.291e+01 3.461e+01 3.674e+01 4.146e+01, threshold=6.923e+01, percent-clipped=0.0 2023-12-23 06:54:19,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1007106.6666666666, ans=0.2 2023-12-23 06:54:31,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1007173.3333333334, ans=0.125 2023-12-23 06:54:44,509 INFO [train.py:886] (3/4) Epoch 32, batch 3350, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4957249.34 frames. ], batch size: 100, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:55:13,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1007440.0, ans=0.125 2023-12-23 06:55:36,289 INFO [train.py:886] (3/4) Epoch 32, batch 3400, loss[loss=0.01002, audio_tagging_loss=0.01002, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4953134.89 frames. ], batch size: 99, lr: 3.36e-03, grad_scale: 64.0 2023-12-23 06:55:46,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1007706.6666666666, ans=0.125 2023-12-23 06:55:54,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1007706.6666666666, ans=0.0 2023-12-23 06:55:57,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.946e+01 3.386e+01 3.545e+01 3.714e+01 5.112e+01, threshold=7.090e+01, percent-clipped=0.0 2023-12-23 06:55:59,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.13 vs. limit=12.0 2023-12-23 06:56:01,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1007773.3333333334, ans=0.125 2023-12-23 06:56:28,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.33 vs. limit=10.0 2023-12-23 06:56:28,823 INFO [train.py:886] (3/4) Epoch 32, batch 3450, loss[loss=0.01379, audio_tagging_loss=0.01379, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4954606.90 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:56:36,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=15.0 2023-12-23 06:56:39,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1008040.0, ans=0.125 2023-12-23 06:56:39,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1008040.0, ans=0.125 2023-12-23 06:56:39,537 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:56:41,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1008040.0, ans=0.0 2023-12-23 06:56:54,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1008106.6666666666, ans=0.125 2023-12-23 06:56:54,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1008106.6666666666, ans=0.125 2023-12-23 06:57:07,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1008173.3333333334, ans=0.125 2023-12-23 06:57:20,532 INFO [train.py:886] (3/4) Epoch 32, batch 3500, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4955045.55 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 64.0 2023-12-23 06:57:32,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1008373.3333333334, ans=0.125 2023-12-23 06:57:42,417 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.074e+01 3.340e+01 3.505e+01 3.678e+01 3.978e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 06:57:53,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-12-23 06:57:58,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1008506.6666666666, ans=0.035 2023-12-23 06:58:12,695 INFO [train.py:886] (3/4) Epoch 32, batch 3550, loss[loss=0.01216, audio_tagging_loss=0.01216, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4953733.60 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:58:27,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1008706.6666666666, ans=0.125 2023-12-23 06:58:27,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1008706.6666666666, ans=0.125 2023-12-23 06:58:44,511 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 06:58:53,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.69 vs. limit=22.5 2023-12-23 06:59:05,013 INFO [train.py:886] (3/4) Epoch 32, batch 3600, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4951571.53 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 06:59:22,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1009040.0, ans=0.07 2023-12-23 06:59:25,496 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.316e+01 3.432e+01 3.603e+01 4.151e+01, threshold=6.864e+01, percent-clipped=0.0 2023-12-23 06:59:25,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1009106.6666666666, ans=0.2 2023-12-23 06:59:27,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-12-23 06:59:31,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1009106.6666666666, ans=0.0 2023-12-23 06:59:42,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1009173.3333333334, ans=0.0 2023-12-23 06:59:55,906 INFO [train.py:886] (3/4) Epoch 32, batch 3650, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4953578.86 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:00:09,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1009373.3333333334, ans=0.09899494936611666 2023-12-23 07:00:29,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1009506.6666666666, ans=0.125 2023-12-23 07:00:48,389 INFO [train.py:886] (3/4) Epoch 32, batch 3700, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4961924.86 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:00:49,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1009640.0, ans=0.125 2023-12-23 07:01:10,308 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.377e+01 3.513e+01 3.629e+01 4.104e+01, threshold=7.025e+01, percent-clipped=0.0 2023-12-23 07:01:17,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1009773.3333333334, ans=0.0 2023-12-23 07:01:23,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2023-12-23 07:01:35,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1009906.6666666666, ans=0.0 2023-12-23 07:01:39,867 INFO [train.py:886] (3/4) Epoch 32, batch 3750, loss[loss=0.01358, audio_tagging_loss=0.01358, over 24750.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4958745.69 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:01:52,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1010040.0, ans=0.0 2023-12-23 07:01:54,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1010040.0, ans=0.0 2023-12-23 07:01:55,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1010040.0, ans=0.125 2023-12-23 07:02:06,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1010106.6666666666, ans=0.95 2023-12-23 07:02:06,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1010106.6666666666, ans=0.125 2023-12-23 07:02:14,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1010173.3333333334, ans=0.0 2023-12-23 07:02:26,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1010240.0, ans=0.125 2023-12-23 07:02:31,700 INFO [train.py:886] (3/4) Epoch 32, batch 3800, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4956910.63 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:02:54,398 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.001e+01 3.373e+01 3.530e+01 3.741e+01 4.683e+01, threshold=7.061e+01, percent-clipped=0.0 2023-12-23 07:02:54,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1010440.0, ans=0.125 2023-12-23 07:03:02,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-12-23 07:03:23,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1010573.3333333334, ans=0.125 2023-12-23 07:03:24,690 INFO [train.py:886] (3/4) Epoch 32, batch 3850, loss[loss=0.01101, audio_tagging_loss=0.01101, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4949760.64 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:03:27,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1010640.0, ans=0.0 2023-12-23 07:03:30,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-12-23 07:03:30,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1010640.0, ans=0.0 2023-12-23 07:03:43,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1010773.3333333334, ans=0.0 2023-12-23 07:03:49,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1010773.3333333334, ans=0.2 2023-12-23 07:04:06,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.45 vs. limit=22.5 2023-12-23 07:04:09,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1010906.6666666666, ans=0.125 2023-12-23 07:04:13,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-12-23 07:04:15,169 INFO [train.py:886] (3/4) Epoch 32, batch 3900, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4953649.61 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:04:35,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1011106.6666666666, ans=0.0 2023-12-23 07:04:36,349 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.332e+01 3.481e+01 3.670e+01 4.273e+01, threshold=6.961e+01, percent-clipped=0.0 2023-12-23 07:04:38,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1011106.6666666666, ans=0.0 2023-12-23 07:04:54,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011173.3333333334, ans=0.1 2023-12-23 07:04:59,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-12-23 07:05:06,139 INFO [train.py:886] (3/4) Epoch 32, batch 3950, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4955408.95 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:05:12,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1011306.6666666666, ans=0.2 2023-12-23 07:05:40,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1011506.6666666666, ans=0.0 2023-12-23 07:05:51,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-12-23 07:05:58,512 INFO [train.py:886] (3/4) Epoch 32, batch 4000, loss[loss=0.01102, audio_tagging_loss=0.01102, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4957320.91 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:06:03,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1011640.0, ans=0.0 2023-12-23 07:06:08,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1011706.6666666666, ans=0.0 2023-12-23 07:06:13,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1011706.6666666666, ans=0.2 2023-12-23 07:06:15,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1011706.6666666666, ans=0.0 2023-12-23 07:06:19,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.350e+01 3.494e+01 3.630e+01 4.145e+01, threshold=6.988e+01, percent-clipped=0.0 2023-12-23 07:06:32,033 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:06:34,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1011840.0, ans=0.0 2023-12-23 07:06:49,333 INFO [train.py:886] (3/4) Epoch 32, batch 4050, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4957408.46 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:06:51,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-12-23 07:07:03,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-23 07:07:41,198 INFO [train.py:886] (3/4) Epoch 32, batch 4100, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4951336.17 frames. ], batch size: 99, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:08:00,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1012440.0, ans=0.04949747468305833 2023-12-23 07:08:02,289 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.038e+01 3.388e+01 3.515e+01 3.658e+01 4.060e+01, threshold=7.030e+01, percent-clipped=0.0 2023-12-23 07:08:14,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012506.6666666666, ans=0.1 2023-12-23 07:08:16,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1012506.6666666666, ans=0.125 2023-12-23 07:08:23,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1012573.3333333334, ans=0.125 2023-12-23 07:08:32,029 INFO [train.py:886] (3/4) Epoch 32, batch 4150, loss[loss=0.01078, audio_tagging_loss=0.01078, over 25000.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4953958.84 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:08:46,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1012706.6666666666, ans=0.1 2023-12-23 07:08:57,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1012773.3333333334, ans=0.125 2023-12-23 07:09:24,404 INFO [train.py:886] (3/4) Epoch 32, batch 4200, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4949987.51 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:09:25,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1012973.3333333334, ans=0.125 2023-12-23 07:09:31,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.18 vs. limit=15.0 2023-12-23 07:09:45,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1013106.6666666666, ans=0.125 2023-12-23 07:09:46,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.77 vs. limit=22.5 2023-12-23 07:09:47,517 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+01 3.331e+01 3.526e+01 3.677e+01 4.548e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:09:47,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1013106.6666666666, ans=0.125 2023-12-23 07:09:47,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1013106.6666666666, ans=0.0 2023-12-23 07:10:17,390 INFO [train.py:886] (3/4) Epoch 32, batch 4250, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4952724.62 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:10:18,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1013306.6666666666, ans=0.1 2023-12-23 07:10:27,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1013306.6666666666, ans=0.125 2023-12-23 07:10:28,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=12.0 2023-12-23 07:10:34,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-12-23 07:10:36,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1013373.3333333334, ans=0.125 2023-12-23 07:10:55,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1013506.6666666666, ans=0.125 2023-12-23 07:10:55,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1013506.6666666666, ans=0.125 2023-12-23 07:11:06,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-12-23 07:11:10,480 INFO [train.py:886] (3/4) Epoch 32, batch 4300, loss[loss=0.01391, audio_tagging_loss=0.01391, over 25000.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4955869.75 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:11:33,504 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.371e+01 3.452e+01 3.602e+01 4.385e+01, threshold=6.904e+01, percent-clipped=0.0 2023-12-23 07:11:33,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1013773.3333333334, ans=0.04949747468305833 2023-12-23 07:11:47,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1013840.0, ans=0.2 2023-12-23 07:11:49,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1013840.0, ans=0.0 2023-12-23 07:11:50,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1013840.0, ans=0.1 2023-12-23 07:11:53,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1013906.6666666666, ans=0.125 2023-12-23 07:12:03,710 INFO [train.py:886] (3/4) Epoch 32, batch 4350, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4957293.74 frames. ], batch size: 100, lr: 3.35e-03, grad_scale: 32.0 2023-12-23 07:12:09,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1013973.3333333334, ans=0.2 2023-12-23 07:12:13,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1014040.0, ans=0.125 2023-12-23 07:12:17,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1014040.0, ans=0.125 2023-12-23 07:12:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1014040.0, ans=0.0 2023-12-23 07:12:25,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1014106.6666666666, ans=0.125 2023-12-23 07:12:28,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2023-12-23 07:12:44,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1014240.0, ans=0.2 2023-12-23 07:12:44,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2023-12-23 07:12:55,253 INFO [train.py:886] (3/4) Epoch 32, batch 4400, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24020.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4953034.48 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:07,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1014373.3333333334, ans=0.95 2023-12-23 07:13:11,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1014373.3333333334, ans=0.1 2023-12-23 07:13:14,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1014440.0, ans=0.05 2023-12-23 07:13:16,349 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.989e+01 3.370e+01 3.596e+01 3.730e+01 4.489e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 07:13:19,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1014440.0, ans=0.125 2023-12-23 07:13:23,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1014440.0, ans=0.1 2023-12-23 07:13:33,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1014506.6666666666, ans=0.125 2023-12-23 07:13:41,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1014573.3333333334, ans=0.0 2023-12-23 07:13:46,493 INFO [train.py:886] (3/4) Epoch 32, batch 4450, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4946298.15 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:13:47,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1014640.0, ans=0.125 2023-12-23 07:13:49,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1014640.0, ans=0.0 2023-12-23 07:14:06,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2023-12-23 07:14:17,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-12-23 07:14:22,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1014840.0, ans=0.125 2023-12-23 07:14:31,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-12-23 07:14:38,858 INFO [train.py:886] (3/4) Epoch 32, batch 4500, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4945955.15 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:14:43,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-12-23 07:14:48,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1015040.0, ans=0.125 2023-12-23 07:14:52,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2023-12-23 07:14:56,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2023-12-23 07:15:00,270 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.070e+01 3.361e+01 3.496e+01 3.729e+01 4.409e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:15:30,780 INFO [train.py:886] (3/4) Epoch 32, batch 4550, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4948137.30 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:15:35,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1015306.6666666666, ans=0.125 2023-12-23 07:15:42,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1015373.3333333334, ans=0.0 2023-12-23 07:15:59,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1015440.0, ans=0.2 2023-12-23 07:16:01,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1015506.6666666666, ans=0.125 2023-12-23 07:16:23,689 INFO [train.py:886] (3/4) Epoch 32, batch 4600, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4957015.45 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:16:24,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1015640.0, ans=0.0 2023-12-23 07:16:31,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1015640.0, ans=0.0 2023-12-23 07:16:33,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1015706.6666666666, ans=0.0 2023-12-23 07:16:45,515 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.339e+01 3.480e+01 3.659e+01 4.110e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 07:16:49,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1015773.3333333334, ans=0.1 2023-12-23 07:17:01,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1015840.0, ans=0.0 2023-12-23 07:17:11,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1015906.6666666666, ans=0.0 2023-12-23 07:17:15,923 INFO [train.py:886] (3/4) Epoch 32, batch 4650, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4958906.24 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:17:22,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1015973.3333333334, ans=0.0 2023-12-23 07:17:27,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=12.0 2023-12-23 07:18:02,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1016240.0, ans=0.125 2023-12-23 07:18:06,525 INFO [train.py:886] (3/4) Epoch 32, batch 4700, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01243, audio_tagging_loss=0.01243, over 4950135.03 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:18:23,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-12-23 07:18:26,131 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.383e+01 3.517e+01 3.637e+01 4.392e+01, threshold=7.033e+01, percent-clipped=0.0 2023-12-23 07:18:30,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1016440.0, ans=0.025 2023-12-23 07:18:36,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-12-23 07:18:44,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1016573.3333333334, ans=0.0 2023-12-23 07:18:46,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-12-23 07:18:53,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=12.0 2023-12-23 07:18:53,559 INFO [train.py:886] (3/4) Epoch 32, batch 4750, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.01255, audio_tagging_loss=0.01255, over 4945696.27 frames. ], batch size: 99, lr: 3.34e-03, grad_scale: 32.0 2023-12-23 07:19:25,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016746.6666666666, ans=0.1 2023-12-23 07:19:29,347 INFO [train.py:886] (3/4) Epoch 33, batch 0, loss[loss=0.027, audio_tagging_loss=0.027, over 25000.00 frames. ], tot_loss[loss=0.027, audio_tagging_loss=0.027, over 25000.00 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:19:29,347 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 07:19:50,851 INFO [train.py:917] (3/4) Epoch 33, validation: loss=0.03278, audio_tagging_loss=0.03278, over 3737520.00 frames. 2023-12-23 07:19:50,852 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 07:19:53,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1016746.6666666666, ans=0.125 2023-12-23 07:20:00,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2023-12-23 07:20:12,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016880.0, ans=0.1 2023-12-23 07:20:22,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-12-23 07:20:25,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-12-23 07:20:29,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=12.0 2023-12-23 07:20:31,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1017013.3333333334, ans=15.0 2023-12-23 07:20:31,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.39 vs. limit=22.5 2023-12-23 07:20:39,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1017013.3333333334, ans=0.0 2023-12-23 07:20:42,085 INFO [train.py:886] (3/4) Epoch 33, batch 50, loss[loss=0.01563, audio_tagging_loss=0.01563, over 25000.00 frames. ], tot_loss[loss=0.01914, audio_tagging_loss=0.01914, over 1121029.27 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:20:42,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1017080.0, ans=0.1 2023-12-23 07:20:46,766 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.141e+01 3.573e+01 4.216e+01 4.740e+01 9.407e+01, threshold=8.432e+01, percent-clipped=7.0 2023-12-23 07:20:47,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1017080.0, ans=0.0 2023-12-23 07:20:59,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1017146.6666666666, ans=0.2 2023-12-23 07:21:00,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1017146.6666666666, ans=0.125 2023-12-23 07:21:03,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1017213.3333333334, ans=0.2 2023-12-23 07:21:34,684 INFO [train.py:886] (3/4) Epoch 33, batch 100, loss[loss=0.01542, audio_tagging_loss=0.01542, over 25000.00 frames. ], tot_loss[loss=0.01677, audio_tagging_loss=0.01677, over 1976560.35 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:21:40,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1017413.3333333334, ans=0.125 2023-12-23 07:21:41,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1017413.3333333334, ans=0.0 2023-12-23 07:21:41,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1017413.3333333334, ans=0.125 2023-12-23 07:21:46,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1017480.0, ans=0.0 2023-12-23 07:21:47,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017480.0, ans=0.1 2023-12-23 07:21:52,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2023-12-23 07:22:01,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1017546.6666666666, ans=0.125 2023-12-23 07:22:24,854 INFO [train.py:886] (3/4) Epoch 33, batch 150, loss[loss=0.01534, audio_tagging_loss=0.01534, over 24904.00 frames. ], tot_loss[loss=0.01537, audio_tagging_loss=0.01537, over 2635038.77 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:22:29,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1017746.6666666666, ans=0.2 2023-12-23 07:22:30,333 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.570e+01 3.774e+01 4.009e+01 4.712e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 07:22:38,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1017813.3333333334, ans=0.1 2023-12-23 07:23:00,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1017946.6666666666, ans=15.0 2023-12-23 07:23:02,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1017946.6666666666, ans=0.5 2023-12-23 07:23:02,637 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:23:06,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1018013.3333333334, ans=0.125 2023-12-23 07:23:11,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1018013.3333333334, ans=0.07 2023-12-23 07:23:16,420 INFO [train.py:886] (3/4) Epoch 33, batch 200, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01448, audio_tagging_loss=0.01448, over 3152030.59 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:23:19,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1018080.0, ans=0.125 2023-12-23 07:23:21,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1018080.0, ans=0.125 2023-12-23 07:23:23,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1018080.0, ans=0.1 2023-12-23 07:23:26,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1018146.6666666666, ans=0.125 2023-12-23 07:23:46,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1018280.0, ans=0.125 2023-12-23 07:23:49,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1018280.0, ans=0.125 2023-12-23 07:24:00,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:01,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1018346.6666666666, ans=0.125 2023-12-23 07:24:01,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2023-12-23 07:24:07,456 INFO [train.py:886] (3/4) Epoch 33, batch 250, loss[loss=0.01538, audio_tagging_loss=0.01538, over 25000.00 frames. ], tot_loss[loss=0.01403, audio_tagging_loss=0.01403, over 3556446.18 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:24:12,235 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.107e+01 3.419e+01 3.522e+01 3.696e+01 4.416e+01, threshold=7.043e+01, percent-clipped=0.0 2023-12-23 07:24:19,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1018480.0, ans=0.2 2023-12-23 07:24:23,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1018480.0, ans=0.125 2023-12-23 07:24:25,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1018480.0, ans=0.125 2023-12-23 07:24:34,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1018546.6666666666, ans=0.0 2023-12-23 07:24:47,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1018613.3333333334, ans=0.05 2023-12-23 07:24:55,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1018680.0, ans=0.0 2023-12-23 07:24:58,546 INFO [train.py:886] (3/4) Epoch 33, batch 300, loss[loss=0.009482, audio_tagging_loss=0.009482, over 24039.00 frames. ], tot_loss[loss=0.01366, audio_tagging_loss=0.01366, over 3868094.70 frames. ], batch size: 100, lr: 3.29e-03, grad_scale: 32.0 2023-12-23 07:25:05,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1018746.6666666666, ans=0.2 2023-12-23 07:25:16,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1018813.3333333334, ans=0.125 2023-12-23 07:25:16,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1018813.3333333334, ans=0.125 2023-12-23 07:25:22,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.56 vs. limit=5.0 2023-12-23 07:25:23,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1018880.0, ans=15.0 2023-12-23 07:25:23,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2023-12-23 07:25:51,259 INFO [train.py:886] (3/4) Epoch 33, batch 350, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01341, audio_tagging_loss=0.01341, over 4097197.52 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:25:56,054 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.335e+01 3.546e+01 3.715e+01 4.310e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 07:26:00,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2023-12-23 07:26:04,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-12-23 07:26:14,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=1019213.3333333334, ans=0.2 2023-12-23 07:26:30,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1019280.0, ans=0.125 2023-12-23 07:26:30,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2023-12-23 07:26:42,790 INFO [train.py:886] (3/4) Epoch 33, batch 400, loss[loss=0.01074, audio_tagging_loss=0.01074, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 4284054.70 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:26:43,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1019413.3333333334, ans=0.125 2023-12-23 07:26:51,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1019413.3333333334, ans=0.0 2023-12-23 07:26:59,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1019480.0, ans=0.125 2023-12-23 07:26:59,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1019480.0, ans=0.1 2023-12-23 07:27:11,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-12-23 07:27:15,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-12-23 07:27:17,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1019613.3333333334, ans=0.0 2023-12-23 07:27:34,264 INFO [train.py:886] (3/4) Epoch 33, batch 450, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 4435908.35 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:27:38,960 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.272e+01 3.469e+01 3.632e+01 4.131e+01, threshold=6.938e+01, percent-clipped=0.0 2023-12-23 07:27:39,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1019746.6666666666, ans=0.125 2023-12-23 07:28:03,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1019880.0, ans=0.0 2023-12-23 07:28:11,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1019946.6666666666, ans=0.125 2023-12-23 07:28:21,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1020013.3333333334, ans=0.125 2023-12-23 07:28:25,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-23 07:28:27,366 INFO [train.py:886] (3/4) Epoch 33, batch 500, loss[loss=0.01508, audio_tagging_loss=0.01508, over 25000.00 frames. ], tot_loss[loss=0.01273, audio_tagging_loss=0.01273, over 4556736.64 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:28:54,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1020213.3333333334, ans=0.0 2023-12-23 07:29:18,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-12-23 07:29:18,729 INFO [train.py:886] (3/4) Epoch 33, batch 550, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.0126, audio_tagging_loss=0.0126, over 4648899.32 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:29:23,355 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.066e+01 3.338e+01 3.495e+01 3.646e+01 4.151e+01, threshold=6.991e+01, percent-clipped=0.0 2023-12-23 07:29:30,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1020480.0, ans=0.0 2023-12-23 07:29:45,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1020546.6666666666, ans=0.125 2023-12-23 07:29:46,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2023-12-23 07:29:49,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1020613.3333333334, ans=0.0 2023-12-23 07:29:54,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-23 07:30:11,207 INFO [train.py:886] (3/4) Epoch 33, batch 600, loss[loss=0.01282, audio_tagging_loss=0.01282, over 25000.00 frames. ], tot_loss[loss=0.01256, audio_tagging_loss=0.01256, over 4713186.87 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:30:32,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.08 vs. limit=10.0 2023-12-23 07:30:52,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1021013.3333333334, ans=0.1 2023-12-23 07:31:01,905 INFO [train.py:886] (3/4) Epoch 33, batch 650, loss[loss=0.01342, audio_tagging_loss=0.01342, over 24935.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4753226.84 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:31:07,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2023-12-23 07:31:07,457 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.031e+01 3.397e+01 3.533e+01 3.690e+01 3.984e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:31:16,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021146.6666666666, ans=0.1 2023-12-23 07:31:41,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1021280.0, ans=0.0 2023-12-23 07:31:52,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1021346.6666666666, ans=0.0 2023-12-23 07:31:52,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1021346.6666666666, ans=0.1 2023-12-23 07:31:54,008 INFO [train.py:886] (3/4) Epoch 33, batch 700, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.01254, audio_tagging_loss=0.01254, over 4790870.95 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 32.0 2023-12-23 07:31:58,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-23 07:32:01,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1021413.3333333334, ans=0.125 2023-12-23 07:32:24,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1021613.3333333334, ans=0.0 2023-12-23 07:32:46,955 INFO [train.py:886] (3/4) Epoch 33, batch 750, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01249, audio_tagging_loss=0.01249, over 4832648.71 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:32:51,660 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.077e+01 3.365e+01 3.500e+01 3.684e+01 4.096e+01, threshold=7.001e+01, percent-clipped=0.0 2023-12-23 07:32:53,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1021746.6666666666, ans=0.125 2023-12-23 07:32:53,850 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:33:25,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1021946.6666666666, ans=0.1 2023-12-23 07:33:37,151 INFO [train.py:886] (3/4) Epoch 33, batch 800, loss[loss=0.01113, audio_tagging_loss=0.01113, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4865204.97 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:34:01,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-12-23 07:34:05,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1022213.3333333334, ans=0.2 2023-12-23 07:34:16,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1022280.0, ans=0.125 2023-12-23 07:34:30,563 INFO [train.py:886] (3/4) Epoch 33, batch 850, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4892175.86 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:34:35,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.933e+01 3.306e+01 3.423e+01 3.606e+01 5.967e+01, threshold=6.845e+01, percent-clipped=0.0 2023-12-23 07:34:43,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-23 07:34:49,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1022546.6666666666, ans=0.125 2023-12-23 07:34:58,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1022546.6666666666, ans=0.0 2023-12-23 07:35:03,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1022613.3333333334, ans=0.0 2023-12-23 07:35:14,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2023-12-23 07:35:16,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1022680.0, ans=0.125 2023-12-23 07:35:21,375 INFO [train.py:886] (3/4) Epoch 33, batch 900, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4909565.69 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:35:24,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1022746.6666666666, ans=0.07 2023-12-23 07:35:38,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1022813.3333333334, ans=0.0 2023-12-23 07:35:41,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1022880.0, ans=0.125 2023-12-23 07:35:50,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022880.0, ans=0.1 2023-12-23 07:36:12,621 INFO [train.py:886] (3/4) Epoch 33, batch 950, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4912452.27 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:36:15,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-12-23 07:36:17,321 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.441e+01 3.582e+01 3.744e+01 4.324e+01, threshold=7.165e+01, percent-clipped=0.0 2023-12-23 07:36:47,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1023280.0, ans=0.2 2023-12-23 07:36:50,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-12-23 07:37:04,743 INFO [train.py:886] (3/4) Epoch 33, batch 1000, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4915879.82 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:37:04,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1023413.3333333334, ans=0.09899494936611666 2023-12-23 07:37:14,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1023480.0, ans=0.125 2023-12-23 07:37:17,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1023480.0, ans=0.0 2023-12-23 07:37:43,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1023613.3333333334, ans=10.0 2023-12-23 07:37:55,588 INFO [train.py:886] (3/4) Epoch 33, batch 1050, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4923211.69 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:38:00,264 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.049e+01 3.330e+01 3.500e+01 3.696e+01 4.249e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 07:38:44,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1024013.3333333334, ans=0.125 2023-12-23 07:38:47,378 INFO [train.py:886] (3/4) Epoch 33, batch 1100, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4926398.34 frames. ], batch size: 100, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:38:56,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1024146.6666666666, ans=0.125 2023-12-23 07:38:58,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 07:39:12,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1024213.3333333334, ans=0.125 2023-12-23 07:39:14,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024213.3333333334, ans=0.1 2023-12-23 07:39:32,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=12.0 2023-12-23 07:39:37,331 INFO [train.py:886] (3/4) Epoch 33, batch 1150, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4933862.23 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:39:42,735 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.007e+01 3.357e+01 3.502e+01 3.673e+01 4.162e+01, threshold=7.004e+01, percent-clipped=0.0 2023-12-23 07:39:45,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1024413.3333333334, ans=0.125 2023-12-23 07:39:48,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1024480.0, ans=0.125 2023-12-23 07:40:18,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1024680.0, ans=0.125 2023-12-23 07:40:26,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1024680.0, ans=0.125 2023-12-23 07:40:28,348 INFO [train.py:886] (3/4) Epoch 33, batch 1200, loss[loss=0.01149, audio_tagging_loss=0.01149, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4942169.92 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:40:36,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2023-12-23 07:41:03,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1024946.6666666666, ans=0.125 2023-12-23 07:41:05,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1024946.6666666666, ans=0.2 2023-12-23 07:41:11,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1025013.3333333334, ans=0.0 2023-12-23 07:41:20,672 INFO [train.py:886] (3/4) Epoch 33, batch 1250, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4942019.93 frames. ], batch size: 99, lr: 3.28e-03, grad_scale: 64.0 2023-12-23 07:41:25,268 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.391e+01 3.513e+01 3.731e+01 4.516e+01, threshold=7.026e+01, percent-clipped=0.0 2023-12-23 07:41:29,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1025080.0, ans=0.125 2023-12-23 07:41:37,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1025146.6666666666, ans=0.09899494936611666 2023-12-23 07:42:04,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-12-23 07:42:06,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1025346.6666666666, ans=0.2 2023-12-23 07:42:09,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1025346.6666666666, ans=0.1 2023-12-23 07:42:12,440 INFO [train.py:886] (3/4) Epoch 33, batch 1300, loss[loss=0.01446, audio_tagging_loss=0.01446, over 25000.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4940562.86 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:42:18,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1025413.3333333334, ans=0.125 2023-12-23 07:42:23,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1025480.0, ans=0.2 2023-12-23 07:42:41,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1025546.6666666666, ans=0.125 2023-12-23 07:42:44,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025613.3333333334, ans=0.1 2023-12-23 07:42:47,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1025613.3333333334, ans=0.0 2023-12-23 07:42:53,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1025680.0, ans=0.125 2023-12-23 07:42:54,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-12-23 07:43:00,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1025680.0, ans=0.125 2023-12-23 07:43:04,401 INFO [train.py:886] (3/4) Epoch 33, batch 1350, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943599.68 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:09,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.898e+01 3.394e+01 3.555e+01 3.712e+01 4.283e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:43:22,002 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:43:27,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2023-12-23 07:43:38,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025946.6666666666, ans=0.1 2023-12-23 07:43:53,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1026013.3333333334, ans=0.125 2023-12-23 07:43:54,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1026013.3333333334, ans=0.0 2023-12-23 07:43:57,039 INFO [train.py:886] (3/4) Epoch 33, batch 1400, loss[loss=0.01263, audio_tagging_loss=0.01263, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943722.97 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:43:57,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1026080.0, ans=10.0 2023-12-23 07:44:02,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1026080.0, ans=0.05 2023-12-23 07:44:09,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-12-23 07:44:14,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1026146.6666666666, ans=0.125 2023-12-23 07:44:21,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1026213.3333333334, ans=0.5 2023-12-23 07:44:43,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.03 vs. limit=22.5 2023-12-23 07:44:46,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1026346.6666666666, ans=0.2 2023-12-23 07:44:47,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1026413.3333333334, ans=0.125 2023-12-23 07:44:48,411 INFO [train.py:886] (3/4) Epoch 33, batch 1450, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4946086.24 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:44:53,831 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.346e+01 3.476e+01 3.616e+01 4.118e+01, threshold=6.952e+01, percent-clipped=0.0 2023-12-23 07:45:18,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1026613.3333333334, ans=0.0 2023-12-23 07:45:24,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1026613.3333333334, ans=0.2 2023-12-23 07:45:40,603 INFO [train.py:886] (3/4) Epoch 33, batch 1500, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4948120.79 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:45:55,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1026813.3333333334, ans=0.2 2023-12-23 07:46:01,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1026880.0, ans=0.125 2023-12-23 07:46:04,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1026880.0, ans=0.0 2023-12-23 07:46:09,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1026880.0, ans=0.1 2023-12-23 07:46:13,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1026946.6666666666, ans=0.125 2023-12-23 07:46:13,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2023-12-23 07:46:23,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1027013.3333333334, ans=0.125 2023-12-23 07:46:25,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1027013.3333333334, ans=0.0 2023-12-23 07:46:25,051 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:46:25,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-12-23 07:46:32,820 INFO [train.py:886] (3/4) Epoch 33, batch 1550, loss[loss=0.0103, audio_tagging_loss=0.0103, over 22166.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4943416.25 frames. ], batch size: 107, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:46:36,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1027080.0, ans=0.125 2023-12-23 07:46:38,189 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.078e+01 3.419e+01 3.555e+01 3.697e+01 4.231e+01, threshold=7.109e+01, percent-clipped=0.0 2023-12-23 07:46:42,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1027146.6666666666, ans=0.0 2023-12-23 07:46:58,901 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:47:06,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-12-23 07:47:06,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1027280.0, ans=0.125 2023-12-23 07:47:16,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1027346.6666666666, ans=0.125 2023-12-23 07:47:23,691 INFO [train.py:886] (3/4) Epoch 33, batch 1600, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24074.00 frames. ], tot_loss[loss=0.01248, audio_tagging_loss=0.01248, over 4942509.49 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:47:37,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-12-23 07:47:52,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1027546.6666666666, ans=0.0 2023-12-23 07:47:57,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1027613.3333333334, ans=0.125 2023-12-23 07:48:08,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1027680.0, ans=0.2 2023-12-23 07:48:16,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1027746.6666666666, ans=0.2 2023-12-23 07:48:16,980 INFO [train.py:886] (3/4) Epoch 33, batch 1650, loss[loss=0.009197, audio_tagging_loss=0.009197, over 24750.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4940389.07 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:48:21,580 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.029e+01 3.394e+01 3.522e+01 3.682e+01 5.123e+01, threshold=7.045e+01, percent-clipped=0.0 2023-12-23 07:48:50,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1027946.6666666666, ans=0.1 2023-12-23 07:48:52,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1027946.6666666666, ans=0.125 2023-12-23 07:48:56,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1028013.3333333334, ans=0.1 2023-12-23 07:49:04,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1028013.3333333334, ans=0.5 2023-12-23 07:49:08,123 INFO [train.py:886] (3/4) Epoch 33, batch 1700, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4946467.73 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:49:10,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1028080.0, ans=0.0 2023-12-23 07:49:18,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1028146.6666666666, ans=0.2 2023-12-23 07:49:18,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028146.6666666666, ans=0.1 2023-12-23 07:49:30,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1028213.3333333334, ans=0.2 2023-12-23 07:49:31,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-12-23 07:49:56,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1028346.6666666666, ans=0.07 2023-12-23 07:49:59,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1028413.3333333334, ans=0.0 2023-12-23 07:49:59,790 INFO [train.py:886] (3/4) Epoch 33, batch 1750, loss[loss=0.01461, audio_tagging_loss=0.01461, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4954870.59 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:49:59,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1028413.3333333334, ans=0.0 2023-12-23 07:50:04,542 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.030e+01 3.329e+01 3.475e+01 3.627e+01 4.397e+01, threshold=6.950e+01, percent-clipped=0.0 2023-12-23 07:50:14,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1028480.0, ans=0.2 2023-12-23 07:50:16,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1028480.0, ans=0.09899494936611666 2023-12-23 07:50:32,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1028613.3333333334, ans=0.2 2023-12-23 07:50:42,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1028680.0, ans=0.125 2023-12-23 07:50:44,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1028680.0, ans=0.125 2023-12-23 07:50:52,827 INFO [train.py:886] (3/4) Epoch 33, batch 1800, loss[loss=0.009433, audio_tagging_loss=0.009433, over 22056.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4951338.40 frames. ], batch size: 107, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:50:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1028746.6666666666, ans=0.0 2023-12-23 07:50:53,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1028746.6666666666, ans=0.125 2023-12-23 07:50:56,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1028746.6666666666, ans=0.125 2023-12-23 07:50:56,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1028746.6666666666, ans=0.1 2023-12-23 07:51:13,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-12-23 07:51:39,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1029013.3333333334, ans=0.2 2023-12-23 07:51:42,084 INFO [train.py:886] (3/4) Epoch 33, batch 1850, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4948243.20 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:51:44,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1029080.0, ans=0.09899494936611666 2023-12-23 07:51:45,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1029080.0, ans=0.125 2023-12-23 07:51:46,821 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.019e+01 3.396e+01 3.524e+01 3.649e+01 4.087e+01, threshold=7.047e+01, percent-clipped=0.0 2023-12-23 07:52:03,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1029213.3333333334, ans=0.2 2023-12-23 07:52:10,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1029213.3333333334, ans=0.1 2023-12-23 07:52:15,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1029280.0, ans=0.125 2023-12-23 07:52:35,173 INFO [train.py:886] (3/4) Epoch 33, batch 1900, loss[loss=0.0103, audio_tagging_loss=0.0103, over 23971.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4943235.49 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:52:48,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1029480.0, ans=0.07 2023-12-23 07:52:55,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.84 vs. limit=22.5 2023-12-23 07:53:07,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-12-23 07:53:26,808 INFO [train.py:886] (3/4) Epoch 33, batch 1950, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4937405.86 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:53:29,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-12-23 07:53:32,176 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.392e+01 3.533e+01 3.748e+01 4.233e+01, threshold=7.067e+01, percent-clipped=0.0 2023-12-23 07:53:32,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1029746.6666666666, ans=0.125 2023-12-23 07:53:35,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.20 vs. limit=10.0 2023-12-23 07:53:49,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2023-12-23 07:53:55,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1029880.0, ans=0.125 2023-12-23 07:54:03,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.22 vs. limit=15.0 2023-12-23 07:54:03,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1029946.6666666666, ans=0.2 2023-12-23 07:54:13,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1030013.3333333334, ans=0.125 2023-12-23 07:54:18,565 INFO [train.py:886] (3/4) Epoch 33, batch 2000, loss[loss=0.01288, audio_tagging_loss=0.01288, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4943029.21 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:54:25,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1030080.0, ans=0.1 2023-12-23 07:54:31,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1030146.6666666666, ans=0.0 2023-12-23 07:54:37,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1030146.6666666666, ans=10.0 2023-12-23 07:54:55,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-12-23 07:55:10,771 INFO [train.py:886] (3/4) Epoch 33, batch 2050, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4949133.78 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:55:12,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1030413.3333333334, ans=0.125 2023-12-23 07:55:13,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030413.3333333334, ans=0.1 2023-12-23 07:55:16,250 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.917e+01 3.317e+01 3.445e+01 3.668e+01 4.108e+01, threshold=6.890e+01, percent-clipped=0.0 2023-12-23 07:55:19,355 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:55:26,938 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 07:55:43,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1030613.3333333334, ans=0.95 2023-12-23 07:55:43,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1030613.3333333334, ans=0.0 2023-12-23 07:55:44,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1030613.3333333334, ans=0.125 2023-12-23 07:55:59,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1030680.0, ans=0.125 2023-12-23 07:56:01,868 INFO [train.py:886] (3/4) Epoch 33, batch 2100, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4946510.12 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:09,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1030746.6666666666, ans=0.125 2023-12-23 07:56:16,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1030813.3333333334, ans=0.0 2023-12-23 07:56:23,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2023-12-23 07:56:31,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1030880.0, ans=0.125 2023-12-23 07:56:37,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1030946.6666666666, ans=0.0 2023-12-23 07:56:54,959 INFO [train.py:886] (3/4) Epoch 33, batch 2150, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4943339.97 frames. ], batch size: 99, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:56:58,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1031080.0, ans=0.125 2023-12-23 07:56:58,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1031080.0, ans=0.125 2023-12-23 07:56:59,623 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.872e+01 3.355e+01 3.526e+01 3.683e+01 4.612e+01, threshold=7.052e+01, percent-clipped=0.0 2023-12-23 07:57:00,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1031080.0, ans=0.015 2023-12-23 07:57:16,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1031213.3333333334, ans=0.125 2023-12-23 07:57:22,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1031213.3333333334, ans=0.0 2023-12-23 07:57:27,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1031280.0, ans=0.0 2023-12-23 07:57:35,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1031346.6666666666, ans=0.125 2023-12-23 07:57:38,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-12-23 07:57:46,395 INFO [train.py:886] (3/4) Epoch 33, batch 2200, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24063.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4937310.61 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 64.0 2023-12-23 07:58:26,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1031613.3333333334, ans=0.09899494936611666 2023-12-23 07:58:37,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1031746.6666666666, ans=0.125 2023-12-23 07:58:38,165 INFO [train.py:886] (3/4) Epoch 33, batch 2250, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4935305.63 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:58:42,940 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.057e+01 3.375e+01 3.511e+01 3.690e+01 4.571e+01, threshold=7.022e+01, percent-clipped=0.0 2023-12-23 07:58:59,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2023-12-23 07:59:04,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1031880.0, ans=0.05 2023-12-23 07:59:08,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1031880.0, ans=0.125 2023-12-23 07:59:18,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1031946.6666666666, ans=0.0 2023-12-23 07:59:20,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1032013.3333333334, ans=0.0 2023-12-23 07:59:30,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-12-23 07:59:30,933 INFO [train.py:886] (3/4) Epoch 33, batch 2300, loss[loss=0.01041, audio_tagging_loss=0.01041, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4935676.70 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 07:59:36,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1032080.0, ans=0.125 2023-12-23 07:59:40,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1032146.6666666666, ans=0.0 2023-12-23 07:59:59,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1032213.3333333334, ans=0.1 2023-12-23 08:00:04,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-23 08:00:08,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1032280.0, ans=0.0 2023-12-23 08:00:23,004 INFO [train.py:886] (3/4) Epoch 33, batch 2350, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4938959.38 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:00:28,461 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+01 3.321e+01 3.472e+01 3.677e+01 4.298e+01, threshold=6.945e+01, percent-clipped=0.0 2023-12-23 08:00:46,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1032546.6666666666, ans=0.125 2023-12-23 08:00:46,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1032546.6666666666, ans=0.0 2023-12-23 08:00:47,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:01:09,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1032680.0, ans=0.2 2023-12-23 08:01:14,860 INFO [train.py:886] (3/4) Epoch 33, batch 2400, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4938478.58 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:01:25,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1032813.3333333334, ans=0.125 2023-12-23 08:01:44,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-12-23 08:01:48,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2023-12-23 08:01:57,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-23 08:02:07,472 INFO [train.py:886] (3/4) Epoch 33, batch 2450, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4949651.39 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:02:12,816 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.321e+01 3.452e+01 3.648e+01 4.269e+01, threshold=6.903e+01, percent-clipped=0.0 2023-12-23 08:02:20,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1033146.6666666666, ans=0.0 2023-12-23 08:02:22,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-12-23 08:02:29,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1033213.3333333334, ans=0.0 2023-12-23 08:02:51,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1033346.6666666666, ans=0.04949747468305833 2023-12-23 08:02:58,915 INFO [train.py:886] (3/4) Epoch 33, batch 2500, loss[loss=0.01307, audio_tagging_loss=0.01307, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4949502.07 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:03:00,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1033413.3333333334, ans=0.125 2023-12-23 08:03:14,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1033480.0, ans=10.0 2023-12-23 08:03:21,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1033546.6666666666, ans=0.2 2023-12-23 08:03:31,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1033613.3333333334, ans=0.125 2023-12-23 08:03:46,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-12-23 08:03:51,007 INFO [train.py:886] (3/4) Epoch 33, batch 2550, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4945898.24 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:03:53,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1033746.6666666666, ans=0.125 2023-12-23 08:03:55,657 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.396e+01 3.556e+01 3.745e+01 4.206e+01, threshold=7.112e+01, percent-clipped=0.0 2023-12-23 08:04:09,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1033813.3333333334, ans=0.1 2023-12-23 08:04:11,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1033880.0, ans=0.0 2023-12-23 08:04:15,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1033880.0, ans=0.05 2023-12-23 08:04:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1033880.0, ans=0.125 2023-12-23 08:04:43,669 INFO [train.py:886] (3/4) Epoch 33, batch 2600, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4949499.18 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:04:57,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1034146.6666666666, ans=0.2 2023-12-23 08:05:06,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1034213.3333333334, ans=0.125 2023-12-23 08:05:08,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1034213.3333333334, ans=0.1 2023-12-23 08:05:17,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1034280.0, ans=0.5 2023-12-23 08:05:30,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1034346.6666666666, ans=0.0 2023-12-23 08:05:33,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1034346.6666666666, ans=0.0 2023-12-23 08:05:34,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1034413.3333333334, ans=0.1 2023-12-23 08:05:34,820 INFO [train.py:886] (3/4) Epoch 33, batch 2650, loss[loss=0.01434, audio_tagging_loss=0.01434, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4951494.09 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:05:36,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-12-23 08:05:39,481 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.366e+01 3.521e+01 3.683e+01 4.023e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 08:06:03,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-12-23 08:06:08,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1034613.3333333334, ans=0.125 2023-12-23 08:06:13,138 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:06:14,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1034613.3333333334, ans=0.125 2023-12-23 08:06:19,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1034680.0, ans=0.125 2023-12-23 08:06:24,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1034680.0, ans=0.0 2023-12-23 08:06:24,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2023-12-23 08:06:27,632 INFO [train.py:886] (3/4) Epoch 33, batch 2700, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24088.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4954064.63 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:06:44,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-12-23 08:06:56,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1034946.6666666666, ans=0.0 2023-12-23 08:07:07,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1035013.3333333334, ans=0.125 2023-12-23 08:07:17,208 INFO [train.py:886] (3/4) Epoch 33, batch 2750, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4954716.82 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 128.0 2023-12-23 08:07:19,259 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:07:23,274 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.910e+01 3.350e+01 3.483e+01 3.677e+01 4.348e+01, threshold=6.966e+01, percent-clipped=0.0 2023-12-23 08:07:27,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-12-23 08:07:33,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-12-23 08:07:51,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1035280.0, ans=0.0 2023-12-23 08:08:09,527 INFO [train.py:886] (3/4) Epoch 33, batch 2800, loss[loss=0.0103, audio_tagging_loss=0.0103, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4955906.37 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:08:10,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2023-12-23 08:08:14,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-12-23 08:08:33,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1035546.6666666666, ans=0.125 2023-12-23 08:09:01,179 INFO [train.py:886] (3/4) Epoch 33, batch 2850, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4951986.66 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:09:07,507 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.407e+01 3.538e+01 3.728e+01 5.942e+01, threshold=7.077e+01, percent-clipped=0.0 2023-12-23 08:09:10,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1035813.3333333334, ans=0.04949747468305833 2023-12-23 08:09:18,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1035813.3333333334, ans=0.0 2023-12-23 08:09:52,027 INFO [train.py:886] (3/4) Epoch 33, batch 2900, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4951366.95 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:10:11,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1036146.6666666666, ans=0.0 2023-12-23 08:10:30,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1036280.0, ans=0.0 2023-12-23 08:10:45,080 INFO [train.py:886] (3/4) Epoch 33, batch 2950, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4951229.17 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:10:50,738 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.310e+01 3.454e+01 3.690e+01 4.834e+01, threshold=6.907e+01, percent-clipped=0.0 2023-12-23 08:10:53,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1036480.0, ans=0.125 2023-12-23 08:11:00,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1036480.0, ans=0.125 2023-12-23 08:11:06,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2023-12-23 08:11:11,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1036546.6666666666, ans=0.0 2023-12-23 08:11:27,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1036680.0, ans=0.2 2023-12-23 08:11:31,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1036680.0, ans=0.125 2023-12-23 08:11:35,883 INFO [train.py:886] (3/4) Epoch 33, batch 3000, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4952412.05 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:11:35,884 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 08:11:56,754 INFO [train.py:917] (3/4) Epoch 33, validation: loss=0.03378, audio_tagging_loss=0.03378, over 3737520.00 frames. 2023-12-23 08:11:56,754 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 08:11:59,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-12-23 08:12:08,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1036813.3333333334, ans=0.1 2023-12-23 08:12:11,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-23 08:12:33,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1036946.6666666666, ans=0.1 2023-12-23 08:12:41,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2023-12-23 08:12:42,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1037013.3333333334, ans=0.125 2023-12-23 08:12:49,274 INFO [train.py:886] (3/4) Epoch 33, batch 3050, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4950065.53 frames. ], batch size: 100, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:12:50,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1037080.0, ans=0.125 2023-12-23 08:12:54,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.381e+01 3.541e+01 3.697e+01 4.146e+01, threshold=7.081e+01, percent-clipped=0.0 2023-12-23 08:13:04,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1037146.6666666666, ans=0.1 2023-12-23 08:13:07,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1037146.6666666666, ans=0.125 2023-12-23 08:13:13,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1037213.3333333334, ans=0.0 2023-12-23 08:13:17,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2023-12-23 08:13:20,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1037280.0, ans=0.125 2023-12-23 08:13:24,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1037280.0, ans=0.2 2023-12-23 08:13:35,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1037346.6666666666, ans=0.125 2023-12-23 08:13:40,921 INFO [train.py:886] (3/4) Epoch 33, batch 3100, loss[loss=0.01449, audio_tagging_loss=0.01449, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4956662.99 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:13:46,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1037413.3333333334, ans=0.125 2023-12-23 08:13:52,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1037480.0, ans=0.0 2023-12-23 08:14:01,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1037546.6666666666, ans=0.2 2023-12-23 08:14:05,361 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.026e-03 2023-12-23 08:14:07,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1037546.6666666666, ans=10.0 2023-12-23 08:14:21,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2023-12-23 08:14:22,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1037680.0, ans=0.125 2023-12-23 08:14:31,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2023-12-23 08:14:32,369 INFO [train.py:886] (3/4) Epoch 33, batch 3150, loss[loss=0.01266, audio_tagging_loss=0.01266, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4953266.10 frames. ], batch size: 99, lr: 3.26e-03, grad_scale: 64.0 2023-12-23 08:14:38,067 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.392e+01 3.546e+01 3.675e+01 4.446e+01, threshold=7.092e+01, percent-clipped=0.0 2023-12-23 08:14:57,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1037880.0, ans=0.125 2023-12-23 08:15:08,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1037946.6666666666, ans=0.04949747468305833 2023-12-23 08:15:21,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1038013.3333333334, ans=0.0 2023-12-23 08:15:24,439 INFO [train.py:886] (3/4) Epoch 33, batch 3200, loss[loss=0.01077, audio_tagging_loss=0.01077, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4946556.54 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:15:30,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1038080.0, ans=0.125 2023-12-23 08:15:38,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1038146.6666666666, ans=0.0 2023-12-23 08:15:49,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1038213.3333333334, ans=0.125 2023-12-23 08:15:52,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1038213.3333333334, ans=0.0 2023-12-23 08:15:55,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1038280.0, ans=0.125 2023-12-23 08:16:10,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1038346.6666666666, ans=0.04949747468305833 2023-12-23 08:16:15,385 INFO [train.py:886] (3/4) Epoch 33, batch 3250, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4944141.83 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:16:21,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1038413.3333333334, ans=0.09899494936611666 2023-12-23 08:16:21,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-12-23 08:16:22,359 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.403e+01 3.565e+01 3.733e+01 4.507e+01, threshold=7.131e+01, percent-clipped=0.0 2023-12-23 08:16:38,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=1038546.6666666666, ans=0.1 2023-12-23 08:16:45,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1038613.3333333334, ans=0.025 2023-12-23 08:16:59,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1038680.0, ans=0.2 2023-12-23 08:17:06,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.26 vs. limit=22.5 2023-12-23 08:17:08,596 INFO [train.py:886] (3/4) Epoch 33, batch 3300, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4942763.17 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:17:19,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-12-23 08:17:33,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2023-12-23 08:17:41,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1038946.6666666666, ans=0.05 2023-12-23 08:17:51,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1039013.3333333334, ans=0.025 2023-12-23 08:17:58,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1039013.3333333334, ans=0.125 2023-12-23 08:17:59,939 INFO [train.py:886] (3/4) Epoch 33, batch 3350, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4949855.43 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:18:06,315 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.096e+01 3.385e+01 3.532e+01 3.687e+01 4.158e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:18:08,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1039080.0, ans=0.1 2023-12-23 08:18:18,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1039146.6666666666, ans=0.0 2023-12-23 08:18:25,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1039213.3333333334, ans=0.2 2023-12-23 08:18:26,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1039213.3333333334, ans=0.125 2023-12-23 08:18:36,980 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:18:37,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1039280.0, ans=0.2 2023-12-23 08:18:42,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1039346.6666666666, ans=0.125 2023-12-23 08:18:48,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1039346.6666666666, ans=0.125 2023-12-23 08:18:50,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-12-23 08:18:50,762 INFO [train.py:886] (3/4) Epoch 33, batch 3400, loss[loss=0.008326, audio_tagging_loss=0.008326, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4953771.40 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:18:58,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1039413.3333333334, ans=0.1 2023-12-23 08:19:01,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1039480.0, ans=0.1 2023-12-23 08:19:08,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1039480.0, ans=0.1 2023-12-23 08:19:15,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1039546.6666666666, ans=0.0 2023-12-23 08:19:21,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1039613.3333333334, ans=6.0 2023-12-23 08:19:32,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=15.0 2023-12-23 08:19:35,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1039680.0, ans=15.0 2023-12-23 08:19:42,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1039746.6666666666, ans=0.0 2023-12-23 08:19:42,851 INFO [train.py:886] (3/4) Epoch 33, batch 3450, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4945727.57 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:19:48,485 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.944e+01 3.448e+01 3.587e+01 3.703e+01 4.197e+01, threshold=7.175e+01, percent-clipped=0.0 2023-12-23 08:20:08,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1039880.0, ans=0.125 2023-12-23 08:20:25,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-12-23 08:20:36,032 INFO [train.py:886] (3/4) Epoch 33, batch 3500, loss[loss=0.01632, audio_tagging_loss=0.01632, over 24750.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4943983.77 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:20:36,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-12-23 08:20:41,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1040080.0, ans=0.2 2023-12-23 08:20:41,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1040080.0, ans=0.0 2023-12-23 08:20:54,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1040213.3333333334, ans=0.125 2023-12-23 08:20:58,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1040213.3333333334, ans=0.125 2023-12-23 08:21:00,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1040213.3333333334, ans=0.125 2023-12-23 08:21:18,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1040346.6666666666, ans=0.125 2023-12-23 08:21:26,440 INFO [train.py:886] (3/4) Epoch 33, batch 3550, loss[loss=0.01021, audio_tagging_loss=0.01021, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4945125.14 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:21:32,830 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.940e+01 3.348e+01 3.483e+01 3.687e+01 4.217e+01, threshold=6.967e+01, percent-clipped=0.0 2023-12-23 08:22:02,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1040613.3333333334, ans=0.0 2023-12-23 08:22:07,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1040680.0, ans=0.1 2023-12-23 08:22:07,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1040680.0, ans=0.0 2023-12-23 08:22:13,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1040680.0, ans=0.125 2023-12-23 08:22:14,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1040680.0, ans=0.125 2023-12-23 08:22:18,350 INFO [train.py:886] (3/4) Epoch 33, batch 3600, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4945635.91 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:22:20,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1040746.6666666666, ans=0.1 2023-12-23 08:22:44,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1040880.0, ans=0.1 2023-12-23 08:22:51,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=12.0 2023-12-23 08:22:53,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1040946.6666666666, ans=0.125 2023-12-23 08:23:01,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1041013.3333333334, ans=0.125 2023-12-23 08:23:02,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1041013.3333333334, ans=0.1 2023-12-23 08:23:09,588 INFO [train.py:886] (3/4) Epoch 33, batch 3650, loss[loss=0.01214, audio_tagging_loss=0.01214, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4950405.35 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:23:12,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1041080.0, ans=0.07 2023-12-23 08:23:15,867 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.932e+01 3.316e+01 3.480e+01 3.651e+01 4.543e+01, threshold=6.960e+01, percent-clipped=0.0 2023-12-23 08:23:21,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.10 vs. limit=22.5 2023-12-23 08:23:21,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1041146.6666666666, ans=0.0 2023-12-23 08:23:44,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1041280.0, ans=0.125 2023-12-23 08:24:01,230 INFO [train.py:886] (3/4) Epoch 33, batch 3700, loss[loss=0.01054, audio_tagging_loss=0.01054, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4953648.27 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:01,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1041413.3333333334, ans=0.125 2023-12-23 08:24:09,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-12-23 08:24:12,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1041480.0, ans=0.125 2023-12-23 08:24:15,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-12-23 08:24:16,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1041480.0, ans=0.125 2023-12-23 08:24:22,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1041546.6666666666, ans=0.2 2023-12-23 08:24:36,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1041613.3333333334, ans=0.125 2023-12-23 08:24:39,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1041613.3333333334, ans=0.2 2023-12-23 08:24:49,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1041680.0, ans=0.125 2023-12-23 08:24:52,535 INFO [train.py:886] (3/4) Epoch 33, batch 3750, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4954365.23 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:24:59,663 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.433e+01 3.584e+01 3.717e+01 4.082e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 08:25:10,411 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:25:13,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1041880.0, ans=0.2 2023-12-23 08:25:15,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1041880.0, ans=0.125 2023-12-23 08:25:16,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-12-23 08:25:21,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1041880.0, ans=0.2 2023-12-23 08:25:34,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-12-23 08:25:42,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-23 08:25:44,407 INFO [train.py:886] (3/4) Epoch 33, batch 3800, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4951476.88 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:25:50,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-12-23 08:25:57,196 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:26:17,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1042280.0, ans=0.125 2023-12-23 08:26:36,529 INFO [train.py:886] (3/4) Epoch 33, batch 3850, loss[loss=0.01241, audio_tagging_loss=0.01241, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4954664.51 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:26:38,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1042413.3333333334, ans=0.2 2023-12-23 08:26:42,210 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.174e+01 3.454e+01 3.600e+01 3.785e+01 4.455e+01, threshold=7.200e+01, percent-clipped=0.0 2023-12-23 08:26:50,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1042480.0, ans=0.125 2023-12-23 08:27:26,755 INFO [train.py:886] (3/4) Epoch 33, batch 3900, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24931.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4958705.46 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:27:28,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1042746.6666666666, ans=0.0 2023-12-23 08:27:30,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2023-12-23 08:27:31,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1042746.6666666666, ans=0.0 2023-12-23 08:27:41,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1042813.3333333334, ans=0.125 2023-12-23 08:27:51,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1042880.0, ans=0.125 2023-12-23 08:27:51,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1042880.0, ans=0.0 2023-12-23 08:28:02,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2023-12-23 08:28:07,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.76 vs. limit=22.5 2023-12-23 08:28:11,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1043013.3333333334, ans=0.125 2023-12-23 08:28:14,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1043013.3333333334, ans=0.025 2023-12-23 08:28:18,534 INFO [train.py:886] (3/4) Epoch 33, batch 3950, loss[loss=0.009553, audio_tagging_loss=0.009553, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4962869.56 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:28:20,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1043080.0, ans=0.125 2023-12-23 08:28:23,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1043080.0, ans=0.125 2023-12-23 08:28:24,293 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.369e+01 3.508e+01 3.685e+01 5.218e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:29:03,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1043346.6666666666, ans=0.125 2023-12-23 08:29:09,808 INFO [train.py:886] (3/4) Epoch 33, batch 4000, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4962257.87 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:29:19,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1043480.0, ans=0.125 2023-12-23 08:29:27,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2023-12-23 08:29:29,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1043546.6666666666, ans=0.125 2023-12-23 08:29:32,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1043546.6666666666, ans=0.1 2023-12-23 08:29:56,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1043680.0, ans=0.125 2023-12-23 08:29:57,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2023-12-23 08:30:00,658 INFO [train.py:886] (3/4) Epoch 33, batch 4050, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4960495.85 frames. ], batch size: 100, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:30:06,404 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.985e+01 3.429e+01 3.579e+01 3.740e+01 4.198e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 08:30:14,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1043813.3333333334, ans=0.0 2023-12-23 08:30:19,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-12-23 08:30:38,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-12-23 08:30:52,101 INFO [train.py:886] (3/4) Epoch 33, batch 4100, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4955474.95 frames. ], batch size: 99, lr: 3.25e-03, grad_scale: 64.0 2023-12-23 08:30:56,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1044080.0, ans=0.0 2023-12-23 08:31:09,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1044146.6666666666, ans=0.125 2023-12-23 08:31:42,122 INFO [train.py:886] (3/4) Epoch 33, batch 4150, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4954128.45 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:31:48,511 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.375e+01 3.544e+01 3.687e+01 4.379e+01, threshold=7.088e+01, percent-clipped=0.0 2023-12-23 08:31:56,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1044480.0, ans=0.125 2023-12-23 08:32:00,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1044480.0, ans=0.125 2023-12-23 08:32:12,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1044613.3333333334, ans=0.125 2023-12-23 08:32:14,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1044613.3333333334, ans=0.2 2023-12-23 08:32:25,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1044680.0, ans=0.125 2023-12-23 08:32:33,496 INFO [train.py:886] (3/4) Epoch 33, batch 4200, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4949437.34 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:32:38,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1044746.6666666666, ans=0.1 2023-12-23 08:33:07,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1044946.6666666666, ans=0.125 2023-12-23 08:33:08,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1044946.6666666666, ans=0.125 2023-12-23 08:33:10,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1044946.6666666666, ans=0.1 2023-12-23 08:33:13,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-12-23 08:33:21,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1045013.3333333334, ans=0.125 2023-12-23 08:33:25,244 INFO [train.py:886] (3/4) Epoch 33, batch 4250, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4953664.70 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:33:31,640 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.010e+01 3.356e+01 3.487e+01 3.659e+01 4.243e+01, threshold=6.975e+01, percent-clipped=0.0 2023-12-23 08:33:40,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1045146.6666666666, ans=0.0 2023-12-23 08:33:48,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1045213.3333333334, ans=0.125 2023-12-23 08:33:57,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1045280.0, ans=0.125 2023-12-23 08:34:16,215 INFO [train.py:886] (3/4) Epoch 33, batch 4300, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4950669.03 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:34:22,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-12-23 08:34:29,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1045480.0, ans=0.0 2023-12-23 08:34:33,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1045480.0, ans=0.0 2023-12-23 08:34:33,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-12-23 08:34:39,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1045546.6666666666, ans=0.125 2023-12-23 08:35:03,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1045680.0, ans=0.09899494936611666 2023-12-23 08:35:08,711 INFO [train.py:886] (3/4) Epoch 33, batch 4350, loss[loss=0.01368, audio_tagging_loss=0.01368, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4955699.83 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:35:12,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2023-12-23 08:35:15,035 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.438e+01 3.558e+01 3.684e+01 4.485e+01, threshold=7.115e+01, percent-clipped=0.0 2023-12-23 08:35:16,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1045746.6666666666, ans=0.0 2023-12-23 08:35:17,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1045746.6666666666, ans=0.125 2023-12-23 08:35:38,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1045880.0, ans=0.125 2023-12-23 08:35:40,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1045946.6666666666, ans=0.0 2023-12-23 08:35:48,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1045946.6666666666, ans=0.125 2023-12-23 08:35:50,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-12-23 08:35:51,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1046013.3333333334, ans=0.0 2023-12-23 08:36:01,111 INFO [train.py:886] (3/4) Epoch 33, batch 4400, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4948084.16 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:11,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-12-23 08:36:15,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2023-12-23 08:36:18,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1046146.6666666666, ans=0.0 2023-12-23 08:36:22,534 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:36:25,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1046213.3333333334, ans=0.125 2023-12-23 08:36:32,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046280.0, ans=0.1 2023-12-23 08:36:39,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1046280.0, ans=0.125 2023-12-23 08:36:40,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1046280.0, ans=0.125 2023-12-23 08:36:49,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2023-12-23 08:36:52,635 INFO [train.py:886] (3/4) Epoch 33, batch 4450, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4943049.18 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:36:57,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1046413.3333333334, ans=0.125 2023-12-23 08:36:58,261 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.349e+01 3.519e+01 3.667e+01 4.264e+01, threshold=7.037e+01, percent-clipped=0.0 2023-12-23 08:36:58,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046413.3333333334, ans=0.1 2023-12-23 08:37:22,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1046546.6666666666, ans=0.0 2023-12-23 08:37:28,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1046613.3333333334, ans=0.125 2023-12-23 08:37:44,999 INFO [train.py:886] (3/4) Epoch 33, batch 4500, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4943118.46 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:37:45,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-12-23 08:37:50,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1046746.6666666666, ans=0.125 2023-12-23 08:37:55,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-12-23 08:37:59,136 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:38:12,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2023-12-23 08:38:14,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1046880.0, ans=0.2 2023-12-23 08:38:16,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-12-23 08:38:19,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1046946.6666666666, ans=0.0 2023-12-23 08:38:35,948 INFO [train.py:886] (3/4) Epoch 33, batch 4550, loss[loss=0.01262, audio_tagging_loss=0.01262, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4944169.55 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:38:38,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-12-23 08:38:40,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1047080.0, ans=0.2 2023-12-23 08:38:43,052 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.099e+01 3.336e+01 3.508e+01 3.645e+01 4.432e+01, threshold=7.015e+01, percent-clipped=0.0 2023-12-23 08:38:54,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1047146.6666666666, ans=0.125 2023-12-23 08:39:20,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:39:28,809 INFO [train.py:886] (3/4) Epoch 33, batch 4600, loss[loss=0.01323, audio_tagging_loss=0.01323, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4950001.84 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:39:39,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1047480.0, ans=0.125 2023-12-23 08:39:41,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1047480.0, ans=0.5 2023-12-23 08:39:47,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=22.5 2023-12-23 08:39:54,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-12-23 08:40:01,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1047613.3333333334, ans=0.1 2023-12-23 08:40:09,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1047680.0, ans=0.125 2023-12-23 08:40:20,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1047746.6666666666, ans=0.125 2023-12-23 08:40:21,222 INFO [train.py:886] (3/4) Epoch 33, batch 4650, loss[loss=0.01225, audio_tagging_loss=0.01225, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4958777.24 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:40:27,557 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+01 3.430e+01 3.556e+01 3.733e+01 4.404e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 08:40:35,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-12-23 08:40:48,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1047880.0, ans=0.0 2023-12-23 08:40:52,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-12-23 08:40:53,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1047946.6666666666, ans=0.07 2023-12-23 08:41:00,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047946.6666666666, ans=0.1 2023-12-23 08:41:08,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1048013.3333333334, ans=0.125 2023-12-23 08:41:12,177 INFO [train.py:886] (3/4) Epoch 33, batch 4700, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4951367.60 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:41:16,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1048080.0, ans=0.125 2023-12-23 08:41:18,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.00 vs. limit=22.5 2023-12-23 08:41:48,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1048280.0, ans=0.0 2023-12-23 08:41:59,658 INFO [train.py:886] (3/4) Epoch 33, batch 4750, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4947492.30 frames. ], batch size: 99, lr: 3.24e-03, grad_scale: 64.0 2023-12-23 08:42:05,088 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.425e+01 3.596e+01 3.749e+01 4.228e+01, threshold=7.192e+01, percent-clipped=0.0 2023-12-23 08:42:35,317 INFO [train.py:886] (3/4) Epoch 34, batch 0, loss[loss=0.02791, audio_tagging_loss=0.02791, over 20832.00 frames. ], tot_loss[loss=0.02791, audio_tagging_loss=0.02791, over 20832.00 frames. ], batch size: 107, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:42:35,317 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 08:42:56,421 INFO [train.py:917] (3/4) Epoch 34, validation: loss=0.03363, audio_tagging_loss=0.03363, over 3737520.00 frames. 2023-12-23 08:42:56,421 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 08:42:58,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1048520.0, ans=0.0 2023-12-23 08:43:10,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1048586.6666666667, ans=0.125 2023-12-23 08:43:11,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-12-23 08:43:13,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048586.6666666667, ans=0.1 2023-12-23 08:43:15,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1048653.3333333333, ans=0.0 2023-12-23 08:43:28,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-12-23 08:43:29,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1048720.0, ans=10.0 2023-12-23 08:43:40,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-12-23 08:43:45,939 INFO [train.py:886] (3/4) Epoch 34, batch 50, loss[loss=0.01434, audio_tagging_loss=0.01434, over 25000.00 frames. ], tot_loss[loss=0.01984, audio_tagging_loss=0.01984, over 1112319.85 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:43:54,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1048853.3333333333, ans=0.0 2023-12-23 08:44:02,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048920.0, ans=0.1 2023-12-23 08:44:08,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1048986.6666666667, ans=0.1 2023-12-23 08:44:13,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1048986.6666666667, ans=0.2 2023-12-23 08:44:14,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048986.6666666667, ans=0.1 2023-12-23 08:44:22,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1049053.3333333333, ans=0.0 2023-12-23 08:44:25,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1049053.3333333333, ans=0.0 2023-12-23 08:44:28,829 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 4.024e+01 4.370e+01 4.886e+01 9.756e+01, threshold=8.739e+01, percent-clipped=6.0 2023-12-23 08:44:37,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2023-12-23 08:44:37,895 INFO [train.py:886] (3/4) Epoch 34, batch 100, loss[loss=0.0157, audio_tagging_loss=0.0157, over 25000.00 frames. ], tot_loss[loss=0.01699, audio_tagging_loss=0.01699, over 1968608.42 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:44:52,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1049253.3333333333, ans=0.125 2023-12-23 08:45:23,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1049453.3333333333, ans=0.125 2023-12-23 08:45:23,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1049453.3333333333, ans=0.125 2023-12-23 08:45:28,899 INFO [train.py:886] (3/4) Epoch 34, batch 150, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01534, audio_tagging_loss=0.01534, over 2630683.46 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:45:51,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-12-23 08:46:11,362 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.487e+01 3.657e+01 3.856e+01 4.371e+01, threshold=7.314e+01, percent-clipped=0.0 2023-12-23 08:46:16,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1049786.6666666667, ans=0.95 2023-12-23 08:46:18,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1049786.6666666667, ans=0.125 2023-12-23 08:46:18,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-12-23 08:46:19,916 INFO [train.py:886] (3/4) Epoch 34, batch 200, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01439, audio_tagging_loss=0.01439, over 3147196.89 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:46:24,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1049853.3333333333, ans=0.125 2023-12-23 08:46:27,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=8.0 2023-12-23 08:46:57,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1050053.3333333333, ans=0.125 2023-12-23 08:47:08,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1050120.0, ans=0.125 2023-12-23 08:47:10,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-12-23 08:47:10,931 INFO [train.py:886] (3/4) Epoch 34, batch 250, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01384, audio_tagging_loss=0.01384, over 3552421.29 frames. ], batch size: 100, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:47:15,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1050186.6666666667, ans=0.125 2023-12-23 08:47:32,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1050320.0, ans=0.2 2023-12-23 08:47:35,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1050320.0, ans=0.125 2023-12-23 08:47:47,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1050386.6666666667, ans=0.2 2023-12-23 08:47:51,926 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.410e+01 3.573e+01 3.673e+01 4.532e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 08:47:55,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1050453.3333333333, ans=0.125 2023-12-23 08:48:00,540 INFO [train.py:886] (3/4) Epoch 34, batch 300, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01346, audio_tagging_loss=0.01346, over 3857966.82 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:48:42,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1050786.6666666667, ans=0.125 2023-12-23 08:48:49,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1050786.6666666667, ans=0.125 2023-12-23 08:48:49,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1050786.6666666667, ans=0.125 2023-12-23 08:48:52,554 INFO [train.py:886] (3/4) Epoch 34, batch 350, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01312, audio_tagging_loss=0.01312, over 4097979.27 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:49:29,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1051053.3333333333, ans=0.0 2023-12-23 08:49:34,324 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.936e+01 3.391e+01 3.531e+01 3.690e+01 4.649e+01, threshold=7.063e+01, percent-clipped=0.0 2023-12-23 08:49:40,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1051120.0, ans=0.125 2023-12-23 08:49:44,271 INFO [train.py:886] (3/4) Epoch 34, batch 400, loss[loss=0.01213, audio_tagging_loss=0.01213, over 24750.00 frames. ], tot_loss[loss=0.01289, audio_tagging_loss=0.01289, over 4288494.36 frames. ], batch size: 99, lr: 3.19e-03, grad_scale: 32.0 2023-12-23 08:49:59,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-12-23 08:50:01,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1051253.3333333333, ans=0.0 2023-12-23 08:50:02,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1051253.3333333333, ans=0.125 2023-12-23 08:50:06,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1051320.0, ans=0.0 2023-12-23 08:50:16,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1051386.6666666667, ans=0.125 2023-12-23 08:50:16,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1051386.6666666667, ans=10.0 2023-12-23 08:50:19,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1051386.6666666667, ans=0.125 2023-12-23 08:50:22,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1051386.6666666667, ans=0.0 2023-12-23 08:50:32,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1051453.3333333333, ans=0.0 2023-12-23 08:50:36,042 INFO [train.py:886] (3/4) Epoch 34, batch 450, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01268, audio_tagging_loss=0.01268, over 4436763.95 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:50:42,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-23 08:50:43,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1051520.0, ans=0.0 2023-12-23 08:51:03,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1051653.3333333333, ans=0.125 2023-12-23 08:51:04,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1051653.3333333333, ans=0.125 2023-12-23 08:51:13,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-12-23 08:51:18,114 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.014e+01 3.386e+01 3.481e+01 3.688e+01 4.054e+01, threshold=6.962e+01, percent-clipped=0.0 2023-12-23 08:51:24,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-12-23 08:51:28,810 INFO [train.py:886] (3/4) Epoch 34, batch 500, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 4552517.49 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:51:36,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1051853.3333333333, ans=0.125 2023-12-23 08:51:38,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-12-23 08:51:48,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1051986.6666666667, ans=0.125 2023-12-23 08:52:01,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052053.3333333333, ans=0.1 2023-12-23 08:52:19,601 INFO [train.py:886] (3/4) Epoch 34, batch 550, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01246, audio_tagging_loss=0.01246, over 4644801.90 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:52:23,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1052186.6666666667, ans=0.2 2023-12-23 08:53:03,275 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.111e+01 3.451e+01 3.642e+01 3.802e+01 4.281e+01, threshold=7.285e+01, percent-clipped=0.0 2023-12-23 08:53:06,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-23 08:53:07,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1052453.3333333333, ans=0.125 2023-12-23 08:53:12,563 INFO [train.py:886] (3/4) Epoch 34, batch 600, loss[loss=0.01034, audio_tagging_loss=0.01034, over 24750.00 frames. ], tot_loss[loss=0.0124, audio_tagging_loss=0.0124, over 4709700.75 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:53:13,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1052520.0, ans=10.0 2023-12-23 08:53:16,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1052520.0, ans=0.2 2023-12-23 08:53:20,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052520.0, ans=0.1 2023-12-23 08:53:37,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1052653.3333333333, ans=0.1 2023-12-23 08:53:43,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1052720.0, ans=0.0 2023-12-23 08:54:04,389 INFO [train.py:886] (3/4) Epoch 34, batch 650, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01241, audio_tagging_loss=0.01241, over 4760197.07 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:54:09,263 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 08:54:17,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052920.0, ans=0.1 2023-12-23 08:54:23,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1052986.6666666667, ans=0.0 2023-12-23 08:54:26,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1052986.6666666667, ans=0.125 2023-12-23 08:54:46,586 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.392e+01 3.565e+01 3.715e+01 5.032e+01, threshold=7.129e+01, percent-clipped=0.0 2023-12-23 08:54:55,079 INFO [train.py:886] (3/4) Epoch 34, batch 700, loss[loss=0.01142, audio_tagging_loss=0.01142, over 23986.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4804991.81 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:54:59,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1053186.6666666667, ans=0.125 2023-12-23 08:55:00,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1053186.6666666667, ans=0.0 2023-12-23 08:55:02,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-12-23 08:55:28,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1053386.6666666667, ans=0.125 2023-12-23 08:55:32,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053386.6666666667, ans=0.1 2023-12-23 08:55:46,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1053453.3333333333, ans=0.025 2023-12-23 08:55:47,832 INFO [train.py:886] (3/4) Epoch 34, batch 750, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4833637.00 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:56:09,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1053653.3333333333, ans=0.2 2023-12-23 08:56:12,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1053653.3333333333, ans=0.125 2023-12-23 08:56:23,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1053720.0, ans=0.0 2023-12-23 08:56:23,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1053720.0, ans=0.0 2023-12-23 08:56:30,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=15.0 2023-12-23 08:56:30,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.054e+01 3.393e+01 3.521e+01 3.705e+01 4.133e+01, threshold=7.041e+01, percent-clipped=0.0 2023-12-23 08:56:40,012 INFO [train.py:886] (3/4) Epoch 34, batch 800, loss[loss=0.01277, audio_tagging_loss=0.01277, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4864770.37 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:56:53,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1053920.0, ans=0.1 2023-12-23 08:57:12,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1054053.3333333333, ans=0.1 2023-12-23 08:57:12,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1054053.3333333333, ans=0.0 2023-12-23 08:57:24,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-12-23 08:57:27,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1054120.0, ans=0.0 2023-12-23 08:57:32,038 INFO [train.py:886] (3/4) Epoch 34, batch 850, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4885009.34 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:57:41,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-23 08:58:13,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1054453.3333333333, ans=0.0 2023-12-23 08:58:14,560 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.893e+01 3.443e+01 3.585e+01 3.750e+01 4.520e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 08:58:25,628 INFO [train.py:886] (3/4) Epoch 34, batch 900, loss[loss=0.01402, audio_tagging_loss=0.01402, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4902215.67 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:58:26,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-12-23 08:58:27,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=22.5 2023-12-23 08:58:30,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-12-23 08:58:32,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1054520.0, ans=0.2 2023-12-23 08:58:54,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1054653.3333333333, ans=0.05 2023-12-23 08:59:10,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1054786.6666666667, ans=0.125 2023-12-23 08:59:16,998 INFO [train.py:886] (3/4) Epoch 34, batch 950, loss[loss=0.01395, audio_tagging_loss=0.01395, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4909360.02 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 08:59:21,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1054853.3333333333, ans=0.1 2023-12-23 08:59:21,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1054853.3333333333, ans=0.5 2023-12-23 08:59:35,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1054920.0, ans=0.125 2023-12-23 08:59:36,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1054920.0, ans=0.02 2023-12-23 08:59:42,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1054986.6666666667, ans=0.0 2023-12-23 08:59:50,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-23 09:00:00,942 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.950e+01 3.447e+01 3.600e+01 3.803e+01 4.759e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:00:09,524 INFO [train.py:886] (3/4) Epoch 34, batch 1000, loss[loss=0.01113, audio_tagging_loss=0.01113, over 23927.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4917665.00 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:00:19,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1055253.3333333333, ans=10.0 2023-12-23 09:00:25,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2023-12-23 09:00:26,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-12-23 09:00:33,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1055320.0, ans=0.125 2023-12-23 09:00:41,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1055386.6666666667, ans=0.0 2023-12-23 09:00:41,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1055386.6666666667, ans=0.05 2023-12-23 09:00:42,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1055386.6666666667, ans=0.125 2023-12-23 09:00:54,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1055453.3333333333, ans=0.0 2023-12-23 09:01:02,063 INFO [train.py:886] (3/4) Epoch 34, batch 1050, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4927902.99 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:01:04,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055520.0, ans=0.1 2023-12-23 09:01:08,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.26 vs. limit=22.5 2023-12-23 09:01:09,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1055520.0, ans=0.0 2023-12-23 09:01:19,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1055586.6666666667, ans=0.125 2023-12-23 09:01:22,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-12-23 09:01:23,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-12-23 09:01:28,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1055653.3333333333, ans=0.0 2023-12-23 09:01:29,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2023-12-23 09:01:34,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055720.0, ans=0.1 2023-12-23 09:01:35,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-12-23 09:01:37,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2023-12-23 09:01:44,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.84 vs. limit=22.5 2023-12-23 09:01:44,522 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.411e+01 3.556e+01 3.695e+01 4.710e+01, threshold=7.113e+01, percent-clipped=0.0 2023-12-23 09:01:46,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1055786.6666666667, ans=0.125 2023-12-23 09:01:50,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1055786.6666666667, ans=0.125 2023-12-23 09:01:53,104 INFO [train.py:886] (3/4) Epoch 34, batch 1100, loss[loss=0.009329, audio_tagging_loss=0.009329, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4933965.37 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:02:12,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2023-12-23 09:02:20,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1055986.6666666667, ans=10.0 2023-12-23 09:02:32,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1056053.3333333333, ans=0.0 2023-12-23 09:02:46,083 INFO [train.py:886] (3/4) Epoch 34, batch 1150, loss[loss=0.01195, audio_tagging_loss=0.01195, over 24750.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4946067.47 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:02:50,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1056186.6666666667, ans=0.125 2023-12-23 09:02:50,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1056186.6666666667, ans=0.125 2023-12-23 09:02:51,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1056186.6666666667, ans=0.2 2023-12-23 09:03:01,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1056253.3333333333, ans=0.125 2023-12-23 09:03:27,716 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.380e+01 3.484e+01 3.660e+01 4.361e+01, threshold=6.968e+01, percent-clipped=0.0 2023-12-23 09:03:31,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1056453.3333333333, ans=0.125 2023-12-23 09:03:32,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2023-12-23 09:03:36,199 INFO [train.py:886] (3/4) Epoch 34, batch 1200, loss[loss=0.01501, audio_tagging_loss=0.01501, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4946488.81 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:04:08,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2023-12-23 09:04:19,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1056786.6666666667, ans=0.125 2023-12-23 09:04:22,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1056786.6666666667, ans=0.0 2023-12-23 09:04:24,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1056786.6666666667, ans=0.125 2023-12-23 09:04:27,996 INFO [train.py:886] (3/4) Epoch 34, batch 1250, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4937423.87 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:04:30,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1056853.3333333333, ans=0.0 2023-12-23 09:04:34,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.55 vs. limit=10.0 2023-12-23 09:04:49,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2023-12-23 09:04:52,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1056986.6666666667, ans=0.125 2023-12-23 09:04:53,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1056986.6666666667, ans=0.125 2023-12-23 09:04:56,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1056986.6666666667, ans=0.2 2023-12-23 09:05:08,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1057120.0, ans=0.125 2023-12-23 09:05:09,583 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.461e+01 3.581e+01 3.707e+01 4.566e+01, threshold=7.161e+01, percent-clipped=0.0 2023-12-23 09:05:20,296 INFO [train.py:886] (3/4) Epoch 34, batch 1300, loss[loss=0.01349, audio_tagging_loss=0.01349, over 24750.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4935129.78 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:05:23,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-12-23 09:05:36,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1057253.3333333333, ans=0.0 2023-12-23 09:05:40,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2023-12-23 09:06:00,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1057453.3333333333, ans=0.125 2023-12-23 09:06:10,395 INFO [train.py:886] (3/4) Epoch 34, batch 1350, loss[loss=0.01397, audio_tagging_loss=0.01397, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4941646.99 frames. ], batch size: 100, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:06:13,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1057520.0, ans=0.0 2023-12-23 09:06:25,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1057586.6666666667, ans=0.0 2023-12-23 09:06:46,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1057720.0, ans=0.125 2023-12-23 09:06:50,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-12-23 09:06:54,023 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.373e+01 3.475e+01 3.645e+01 4.225e+01, threshold=6.949e+01, percent-clipped=0.0 2023-12-23 09:07:02,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1057853.3333333333, ans=0.125 2023-12-23 09:07:03,267 INFO [train.py:886] (3/4) Epoch 34, batch 1400, loss[loss=0.01447, audio_tagging_loss=0.01447, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4949816.65 frames. ], batch size: 99, lr: 3.18e-03, grad_scale: 32.0 2023-12-23 09:07:03,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1057853.3333333333, ans=0.125 2023-12-23 09:07:09,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1057853.3333333333, ans=0.2 2023-12-23 09:07:21,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1057920.0, ans=0.0 2023-12-23 09:07:26,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1057986.6666666667, ans=0.0 2023-12-23 09:07:37,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1058053.3333333333, ans=0.0 2023-12-23 09:07:39,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2023-12-23 09:07:52,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1058120.0, ans=0.125 2023-12-23 09:07:52,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1058120.0, ans=0.025 2023-12-23 09:07:54,250 INFO [train.py:886] (3/4) Epoch 34, batch 1450, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4947153.66 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:08:26,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=12.0 2023-12-23 09:08:26,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1058386.6666666667, ans=0.0 2023-12-23 09:08:37,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.058e+01 3.393e+01 3.506e+01 3.631e+01 4.312e+01, threshold=7.011e+01, percent-clipped=0.0 2023-12-23 09:08:46,585 INFO [train.py:886] (3/4) Epoch 34, batch 1500, loss[loss=0.01499, audio_tagging_loss=0.01499, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4957352.13 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:08:49,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-12-23 09:09:02,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1058586.6666666667, ans=0.0 2023-12-23 09:09:13,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1058653.3333333333, ans=0.125 2023-12-23 09:09:15,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1058653.3333333333, ans=0.125 2023-12-23 09:09:21,026 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:09:26,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1058786.6666666667, ans=22.5 2023-12-23 09:09:35,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1058786.6666666667, ans=0.125 2023-12-23 09:09:37,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1058853.3333333333, ans=0.125 2023-12-23 09:09:37,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1058853.3333333333, ans=0.2 2023-12-23 09:09:37,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-12-23 09:09:38,133 INFO [train.py:886] (3/4) Epoch 34, batch 1550, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4957185.34 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:09:56,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1058920.0, ans=0.125 2023-12-23 09:10:07,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1058986.6666666667, ans=0.025 2023-12-23 09:10:17,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1059053.3333333333, ans=0.0 2023-12-23 09:10:21,188 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.468e+01 3.577e+01 3.734e+01 4.228e+01, threshold=7.153e+01, percent-clipped=0.0 2023-12-23 09:10:25,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1059120.0, ans=0.07 2023-12-23 09:10:29,764 INFO [train.py:886] (3/4) Epoch 34, batch 1600, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4954596.08 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:10:29,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1059186.6666666667, ans=0.125 2023-12-23 09:11:22,327 INFO [train.py:886] (3/4) Epoch 34, batch 1650, loss[loss=0.01394, audio_tagging_loss=0.01394, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4948466.59 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:11:26,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-12-23 09:11:47,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1059653.3333333333, ans=0.0 2023-12-23 09:11:52,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-12-23 09:12:03,325 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.325e+01 3.519e+01 3.722e+01 4.583e+01, threshold=7.038e+01, percent-clipped=0.0 2023-12-23 09:12:03,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1059786.6666666667, ans=0.2 2023-12-23 09:12:04,479 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:12:11,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059786.6666666667, ans=0.1 2023-12-23 09:12:13,267 INFO [train.py:886] (3/4) Epoch 34, batch 1700, loss[loss=0.01426, audio_tagging_loss=0.01426, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4951138.81 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:12:45,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1060053.3333333333, ans=0.0 2023-12-23 09:12:55,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1060120.0, ans=0.2 2023-12-23 09:12:58,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1060120.0, ans=0.125 2023-12-23 09:13:02,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1060120.0, ans=0.125 2023-12-23 09:13:05,125 INFO [train.py:886] (3/4) Epoch 34, batch 1750, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4954080.96 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:13:11,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1060186.6666666667, ans=0.2 2023-12-23 09:13:16,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1060253.3333333333, ans=0.125 2023-12-23 09:13:16,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060253.3333333333, ans=0.1 2023-12-23 09:13:20,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-12-23 09:13:20,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1060253.3333333333, ans=0.0 2023-12-23 09:13:38,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2023-12-23 09:13:46,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-12-23 09:13:47,290 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.064e+01 3.371e+01 3.527e+01 3.701e+01 4.388e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 09:13:52,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-12-23 09:13:57,072 INFO [train.py:886] (3/4) Epoch 34, batch 1800, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4955367.15 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:14:02,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1060520.0, ans=0.04949747468305833 2023-12-23 09:14:11,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1060586.6666666667, ans=0.0 2023-12-23 09:14:36,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060720.0, ans=0.1 2023-12-23 09:14:36,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1060720.0, ans=0.0 2023-12-23 09:14:47,623 INFO [train.py:886] (3/4) Epoch 34, batch 1850, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4953577.23 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:15:02,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1060920.0, ans=0.1 2023-12-23 09:15:09,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-12-23 09:15:18,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-12-23 09:15:20,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1061053.3333333333, ans=0.1 2023-12-23 09:15:22,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-12-23 09:15:24,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1061053.3333333333, ans=0.2 2023-12-23 09:15:24,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1061053.3333333333, ans=0.1 2023-12-23 09:15:30,276 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.414e+01 3.585e+01 3.770e+01 4.253e+01, threshold=7.170e+01, percent-clipped=0.0 2023-12-23 09:15:40,221 INFO [train.py:886] (3/4) Epoch 34, batch 1900, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01233, audio_tagging_loss=0.01233, over 4943992.45 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:15:41,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-23 09:15:50,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1061253.3333333333, ans=0.2 2023-12-23 09:15:52,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1061253.3333333333, ans=0.0 2023-12-23 09:15:53,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-23 09:15:55,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1061253.3333333333, ans=0.125 2023-12-23 09:15:55,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1061253.3333333333, ans=0.125 2023-12-23 09:16:13,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1061386.6666666667, ans=0.125 2023-12-23 09:16:14,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-12-23 09:16:15,644 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:16:16,545 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:16:23,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1061453.3333333333, ans=0.0 2023-12-23 09:16:24,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1061453.3333333333, ans=0.0 2023-12-23 09:16:31,225 INFO [train.py:886] (3/4) Epoch 34, batch 1950, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4936926.46 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 32.0 2023-12-23 09:17:14,342 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.990e+01 3.384e+01 3.570e+01 3.708e+01 4.243e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:17:23,728 INFO [train.py:886] (3/4) Epoch 34, batch 2000, loss[loss=0.01048, audio_tagging_loss=0.01048, over 21786.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4938607.78 frames. ], batch size: 107, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:17:31,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1061853.3333333333, ans=0.0 2023-12-23 09:17:54,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1062053.3333333333, ans=0.0 2023-12-23 09:18:04,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1062053.3333333333, ans=0.02 2023-12-23 09:18:15,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1062186.6666666667, ans=0.2 2023-12-23 09:18:16,338 INFO [train.py:886] (3/4) Epoch 34, batch 2050, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4936212.85 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:18:34,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1062253.3333333333, ans=0.1 2023-12-23 09:18:51,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1062386.6666666667, ans=0.1 2023-12-23 09:18:54,648 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:18:55,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1062386.6666666667, ans=0.125 2023-12-23 09:18:58,854 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.394e+01 3.550e+01 3.730e+01 4.740e+01, threshold=7.100e+01, percent-clipped=0.0 2023-12-23 09:18:59,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1062453.3333333333, ans=0.125 2023-12-23 09:19:02,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1062453.3333333333, ans=0.125 2023-12-23 09:19:07,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1062520.0, ans=0.0 2023-12-23 09:19:08,966 INFO [train.py:886] (3/4) Epoch 34, batch 2100, loss[loss=0.01383, audio_tagging_loss=0.01383, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4942180.14 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:19:09,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.36 vs. limit=15.0 2023-12-23 09:19:22,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2023-12-23 09:19:33,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1062653.3333333333, ans=0.0 2023-12-23 09:19:36,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1062653.3333333333, ans=10.0 2023-12-23 09:19:36,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1062653.3333333333, ans=0.0 2023-12-23 09:19:47,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1062720.0, ans=0.2 2023-12-23 09:19:48,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1062786.6666666667, ans=0.0 2023-12-23 09:19:59,889 INFO [train.py:886] (3/4) Epoch 34, batch 2150, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4947470.51 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:20:00,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-12-23 09:20:06,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1062853.3333333333, ans=0.125 2023-12-23 09:20:31,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-12-23 09:20:39,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1063053.3333333333, ans=0.0 2023-12-23 09:20:42,090 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.426e+01 3.569e+01 3.716e+01 4.443e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 09:20:52,021 INFO [train.py:886] (3/4) Epoch 34, batch 2200, loss[loss=0.01461, audio_tagging_loss=0.01461, over 24945.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4941701.28 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:21:33,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1063453.3333333333, ans=0.07 2023-12-23 09:21:43,778 INFO [train.py:886] (3/4) Epoch 34, batch 2250, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4941875.97 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:21:44,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1063520.0, ans=0.125 2023-12-23 09:21:49,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063520.0, ans=0.1 2023-12-23 09:22:05,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1063653.3333333333, ans=0.2 2023-12-23 09:22:05,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-12-23 09:22:10,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1063653.3333333333, ans=0.125 2023-12-23 09:22:11,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1063653.3333333333, ans=0.125 2023-12-23 09:22:27,019 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.422e+01 3.558e+01 3.753e+01 4.731e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:22:32,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1063786.6666666667, ans=0.0 2023-12-23 09:22:36,328 INFO [train.py:886] (3/4) Epoch 34, batch 2300, loss[loss=0.01444, audio_tagging_loss=0.01444, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4944614.32 frames. ], batch size: 99, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:22:40,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1063853.3333333333, ans=0.125 2023-12-23 09:22:55,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1063920.0, ans=0.1 2023-12-23 09:23:04,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1063986.6666666667, ans=10.0 2023-12-23 09:23:13,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064053.3333333333, ans=0.0 2023-12-23 09:23:18,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1064120.0, ans=0.2 2023-12-23 09:23:29,145 INFO [train.py:886] (3/4) Epoch 34, batch 2350, loss[loss=0.01148, audio_tagging_loss=0.01148, over 25000.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4944464.67 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:23:29,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1064186.6666666667, ans=0.125 2023-12-23 09:23:33,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=8.0 2023-12-23 09:23:50,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1064320.0, ans=10.0 2023-12-23 09:24:01,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1064386.6666666667, ans=0.0 2023-12-23 09:24:03,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064386.6666666667, ans=0.0 2023-12-23 09:24:11,374 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.992e+01 3.391e+01 3.504e+01 3.673e+01 5.436e+01, threshold=7.008e+01, percent-clipped=0.0 2023-12-23 09:24:14,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1064453.3333333333, ans=0.09899494936611666 2023-12-23 09:24:19,900 INFO [train.py:886] (3/4) Epoch 34, batch 2400, loss[loss=0.01349, audio_tagging_loss=0.01349, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4946401.42 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:24:22,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1064520.0, ans=0.0 2023-12-23 09:24:27,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1064520.0, ans=0.125 2023-12-23 09:24:47,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1064653.3333333333, ans=0.0 2023-12-23 09:24:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1064720.0, ans=0.1 2023-12-23 09:25:10,989 INFO [train.py:886] (3/4) Epoch 34, batch 2450, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4950341.63 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 64.0 2023-12-23 09:25:13,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1064853.3333333333, ans=0.0 2023-12-23 09:25:19,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1064853.3333333333, ans=0.0 2023-12-23 09:25:20,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2023-12-23 09:25:31,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=15.0 2023-12-23 09:25:37,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-12-23 09:25:41,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1065053.3333333333, ans=0.125 2023-12-23 09:25:52,954 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.052e+01 3.384e+01 3.531e+01 3.725e+01 4.723e+01, threshold=7.062e+01, percent-clipped=0.0 2023-12-23 09:26:00,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1065186.6666666667, ans=0.0 2023-12-23 09:26:01,415 INFO [train.py:886] (3/4) Epoch 34, batch 2500, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4950972.11 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:26:30,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1065320.0, ans=0.125 2023-12-23 09:26:33,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1065386.6666666667, ans=0.125 2023-12-23 09:26:54,400 INFO [train.py:886] (3/4) Epoch 34, batch 2550, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4948110.79 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:27:13,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1065586.6666666667, ans=0.125 2023-12-23 09:27:21,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1065653.3333333333, ans=0.125 2023-12-23 09:27:23,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1065653.3333333333, ans=0.125 2023-12-23 09:27:35,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.394e+01 3.621e+01 3.808e+01 4.469e+01, threshold=7.242e+01, percent-clipped=0.0 2023-12-23 09:27:36,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1065786.6666666667, ans=0.2 2023-12-23 09:27:43,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1065786.6666666667, ans=0.0 2023-12-23 09:27:46,490 INFO [train.py:886] (3/4) Epoch 34, batch 2600, loss[loss=0.01408, audio_tagging_loss=0.01408, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4948220.87 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:28:08,151 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:28:10,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1065986.6666666667, ans=0.09899494936611666 2023-12-23 09:28:15,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-12-23 09:28:22,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1066053.3333333333, ans=0.125 2023-12-23 09:28:23,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2023-12-23 09:28:27,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066120.0, ans=0.1 2023-12-23 09:28:30,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-12-23 09:28:32,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1066120.0, ans=0.0 2023-12-23 09:28:37,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.84 vs. limit=22.5 2023-12-23 09:28:37,529 INFO [train.py:886] (3/4) Epoch 34, batch 2650, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4951379.06 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:28:41,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1066186.6666666667, ans=0.1 2023-12-23 09:28:49,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1066253.3333333333, ans=0.125 2023-12-23 09:28:58,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2023-12-23 09:29:20,769 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.086e+01 3.338e+01 3.500e+01 3.671e+01 4.069e+01, threshold=7.000e+01, percent-clipped=0.0 2023-12-23 09:29:30,357 INFO [train.py:886] (3/4) Epoch 34, batch 2700, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4952930.99 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:29:36,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1066520.0, ans=0.1 2023-12-23 09:29:38,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1066520.0, ans=0.2 2023-12-23 09:29:47,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2023-12-23 09:29:56,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1066653.3333333333, ans=0.125 2023-12-23 09:30:06,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1066720.0, ans=0.1 2023-12-23 09:30:14,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1066786.6666666667, ans=0.0 2023-12-23 09:30:18,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1066786.6666666667, ans=0.125 2023-12-23 09:30:22,355 INFO [train.py:886] (3/4) Epoch 34, batch 2750, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4958441.03 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:30:22,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1066853.3333333333, ans=0.0 2023-12-23 09:30:26,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1066853.3333333333, ans=0.2 2023-12-23 09:30:27,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1066853.3333333333, ans=0.125 2023-12-23 09:30:38,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1066920.0, ans=0.125 2023-12-23 09:30:44,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-12-23 09:31:05,353 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.431e+01 3.588e+01 3.825e+01 4.310e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 09:31:14,672 INFO [train.py:886] (3/4) Epoch 34, batch 2800, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4955662.64 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:31:42,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1067320.0, ans=0.125 2023-12-23 09:31:54,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1067386.6666666667, ans=0.0 2023-12-23 09:32:03,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1067453.3333333333, ans=0.1 2023-12-23 09:32:07,295 INFO [train.py:886] (3/4) Epoch 34, batch 2850, loss[loss=0.01081, audio_tagging_loss=0.01081, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4953866.04 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:32:21,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1067586.6666666667, ans=0.0 2023-12-23 09:32:29,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-12-23 09:32:38,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1067720.0, ans=0.125 2023-12-23 09:32:40,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-12-23 09:32:48,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1067786.6666666667, ans=0.0 2023-12-23 09:32:49,736 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.095e+01 3.409e+01 3.533e+01 3.696e+01 4.223e+01, threshold=7.066e+01, percent-clipped=0.0 2023-12-23 09:32:50,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1067786.6666666667, ans=0.2 2023-12-23 09:32:58,215 INFO [train.py:886] (3/4) Epoch 34, batch 2900, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4954041.07 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:33:20,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1067986.6666666667, ans=0.125 2023-12-23 09:33:29,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1068053.3333333333, ans=0.1 2023-12-23 09:33:50,600 INFO [train.py:886] (3/4) Epoch 34, batch 2950, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4954808.99 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:33:54,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1068186.6666666667, ans=0.125 2023-12-23 09:33:57,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2023-12-23 09:34:28,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1068386.6666666667, ans=0.125 2023-12-23 09:34:30,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1068386.6666666667, ans=0.125 2023-12-23 09:34:32,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.362e+01 3.516e+01 3.709e+01 4.574e+01, threshold=7.032e+01, percent-clipped=0.0 2023-12-23 09:34:34,247 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:34:36,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1068453.3333333333, ans=0.125 2023-12-23 09:34:42,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1068520.0, ans=0.2 2023-12-23 09:34:42,803 INFO [train.py:886] (3/4) Epoch 34, batch 3000, loss[loss=0.01529, audio_tagging_loss=0.01529, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4956169.40 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:34:42,803 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 09:34:54,590 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8760, 3.1804, 3.3229, 2.5321, 2.7201, 2.9598, 2.8951, 2.5412], device='cuda:3') 2023-12-23 09:35:04,054 INFO [train.py:917] (3/4) Epoch 34, validation: loss=0.03414, audio_tagging_loss=0.03414, over 3737520.00 frames. 2023-12-23 09:35:04,055 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 09:35:14,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1068586.6666666667, ans=0.125 2023-12-23 09:35:15,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1068586.6666666667, ans=0.07 2023-12-23 09:35:19,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1068586.6666666667, ans=0.125 2023-12-23 09:35:19,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1068586.6666666667, ans=0.04949747468305833 2023-12-23 09:35:32,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.03 vs. limit=10.0 2023-12-23 09:35:37,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1068720.0, ans=0.0 2023-12-23 09:35:39,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1068720.0, ans=0.0 2023-12-23 09:35:53,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2023-12-23 09:35:54,413 INFO [train.py:886] (3/4) Epoch 34, batch 3050, loss[loss=0.01084, audio_tagging_loss=0.01084, over 23991.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4962346.81 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:36:09,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1068920.0, ans=0.0 2023-12-23 09:36:16,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1068986.6666666667, ans=0.0 2023-12-23 09:36:23,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1068986.6666666667, ans=0.0 2023-12-23 09:36:36,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.409e+01 3.553e+01 3.721e+01 5.104e+01, threshold=7.106e+01, percent-clipped=0.0 2023-12-23 09:36:43,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1069120.0, ans=0.05 2023-12-23 09:36:46,699 INFO [train.py:886] (3/4) Epoch 34, batch 3100, loss[loss=0.01223, audio_tagging_loss=0.01223, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4958292.42 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:36:49,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069186.6666666667, ans=0.0 2023-12-23 09:37:16,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1069320.0, ans=0.125 2023-12-23 09:37:21,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-12-23 09:37:28,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1069453.3333333333, ans=0.125 2023-12-23 09:37:28,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1069453.3333333333, ans=0.125 2023-12-23 09:37:37,805 INFO [train.py:886] (3/4) Epoch 34, batch 3150, loss[loss=0.01495, audio_tagging_loss=0.01495, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4951632.09 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:37:37,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1069520.0, ans=0.125 2023-12-23 09:37:37,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1069520.0, ans=0.0 2023-12-23 09:37:56,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1069586.6666666667, ans=0.125 2023-12-23 09:38:08,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1069720.0, ans=0.125 2023-12-23 09:38:10,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069720.0, ans=0.0 2023-12-23 09:38:12,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1069720.0, ans=0.04949747468305833 2023-12-23 09:38:12,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069720.0, ans=0.0 2023-12-23 09:38:21,535 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.098e+01 3.466e+01 3.603e+01 3.773e+01 5.785e+01, threshold=7.206e+01, percent-clipped=0.0 2023-12-23 09:38:22,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1069786.6666666667, ans=0.125 2023-12-23 09:38:22,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-23 09:38:23,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1069786.6666666667, ans=0.04949747468305833 2023-12-23 09:38:30,590 INFO [train.py:886] (3/4) Epoch 34, batch 3200, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4949707.84 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:38:34,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1069853.3333333333, ans=0.125 2023-12-23 09:38:43,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1069920.0, ans=0.2 2023-12-23 09:39:23,023 INFO [train.py:886] (3/4) Epoch 34, batch 3250, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4953152.99 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:39:28,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1070186.6666666667, ans=0.0 2023-12-23 09:39:50,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1070320.0, ans=0.125 2023-12-23 09:39:56,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070386.6666666667, ans=0.1 2023-12-23 09:40:05,134 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.908e+01 3.341e+01 3.558e+01 3.738e+01 4.204e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 09:40:05,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1070453.3333333333, ans=0.125 2023-12-23 09:40:13,618 INFO [train.py:886] (3/4) Epoch 34, batch 3300, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4957939.35 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:40:14,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1070520.0, ans=0.1 2023-12-23 09:40:17,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1070520.0, ans=0.09899494936611666 2023-12-23 09:40:18,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1070520.0, ans=0.1 2023-12-23 09:40:21,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1070520.0, ans=0.0 2023-12-23 09:40:36,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1070653.3333333333, ans=0.0 2023-12-23 09:40:40,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1070653.3333333333, ans=0.125 2023-12-23 09:40:45,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1070720.0, ans=0.125 2023-12-23 09:41:05,508 INFO [train.py:886] (3/4) Epoch 34, batch 3350, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4956711.39 frames. ], batch size: 100, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:41:18,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-12-23 09:41:19,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1070920.0, ans=0.125 2023-12-23 09:41:21,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070920.0, ans=0.125 2023-12-23 09:41:47,081 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.108e+01 3.417e+01 3.570e+01 3.745e+01 4.410e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 09:41:51,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1071120.0, ans=0.0 2023-12-23 09:41:55,625 INFO [train.py:886] (3/4) Epoch 34, batch 3400, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4954444.17 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:42:10,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-12-23 09:42:19,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1071320.0, ans=0.0 2023-12-23 09:42:37,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1071453.3333333333, ans=0.0 2023-12-23 09:42:40,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1071453.3333333333, ans=0.0 2023-12-23 09:42:48,478 INFO [train.py:886] (3/4) Epoch 34, batch 3450, loss[loss=0.01137, audio_tagging_loss=0.01137, over 24750.00 frames. ], tot_loss[loss=0.01218, audio_tagging_loss=0.01218, over 4951238.00 frames. ], batch size: 99, lr: 3.16e-03, grad_scale: 64.0 2023-12-23 09:43:00,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:43:01,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 09:43:02,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:43:05,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1071586.6666666667, ans=0.125 2023-12-23 09:43:17,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1071653.3333333333, ans=0.125 2023-12-23 09:43:18,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1071653.3333333333, ans=0.05 2023-12-23 09:43:19,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2023-12-23 09:43:29,989 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.457e+01 3.606e+01 3.781e+01 4.224e+01, threshold=7.212e+01, percent-clipped=0.0 2023-12-23 09:43:30,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2023-12-23 09:43:33,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1071786.6666666667, ans=0.125 2023-12-23 09:43:35,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071786.6666666667, ans=0.1 2023-12-23 09:43:39,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1071853.3333333333, ans=0.05 2023-12-23 09:43:40,606 INFO [train.py:886] (3/4) Epoch 34, batch 3500, loss[loss=0.0108, audio_tagging_loss=0.0108, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4947451.36 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:43:42,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1071853.3333333333, ans=0.125 2023-12-23 09:44:00,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1071986.6666666667, ans=0.0 2023-12-23 09:44:13,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1072053.3333333333, ans=0.09899494936611666 2023-12-23 09:44:14,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1072053.3333333333, ans=0.0 2023-12-23 09:44:19,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1072053.3333333333, ans=0.125 2023-12-23 09:44:19,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1072053.3333333333, ans=0.125 2023-12-23 09:44:31,678 INFO [train.py:886] (3/4) Epoch 34, batch 3550, loss[loss=0.008793, audio_tagging_loss=0.008793, over 22223.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4944066.53 frames. ], batch size: 107, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:45:00,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1072320.0, ans=0.1 2023-12-23 09:45:14,666 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.412e+01 3.528e+01 3.703e+01 4.575e+01, threshold=7.057e+01, percent-clipped=0.0 2023-12-23 09:45:24,647 INFO [train.py:886] (3/4) Epoch 34, batch 3600, loss[loss=0.01465, audio_tagging_loss=0.01465, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4949024.65 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:45:33,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1072586.6666666667, ans=0.125 2023-12-23 09:45:36,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1072586.6666666667, ans=0.04949747468305833 2023-12-23 09:45:58,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=22.5 2023-12-23 09:46:14,850 INFO [train.py:886] (3/4) Epoch 34, batch 3650, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4953411.32 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:46:27,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1072920.0, ans=0.1 2023-12-23 09:46:39,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1072986.6666666667, ans=0.0 2023-12-23 09:46:45,297 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:46:57,542 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.017e+01 3.373e+01 3.534e+01 3.700e+01 5.262e+01, threshold=7.068e+01, percent-clipped=0.0 2023-12-23 09:46:59,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1073120.0, ans=0.125 2023-12-23 09:47:06,752 INFO [train.py:886] (3/4) Epoch 34, batch 3700, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4949627.77 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:47:09,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073186.6666666667, ans=0.1 2023-12-23 09:47:14,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1073186.6666666667, ans=0.125 2023-12-23 09:47:20,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1073253.3333333333, ans=0.0 2023-12-23 09:47:58,690 INFO [train.py:886] (3/4) Epoch 34, batch 3750, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24750.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4940126.14 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:48:08,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073586.6666666667, ans=0.1 2023-12-23 09:48:16,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1073586.6666666667, ans=0.2 2023-12-23 09:48:42,090 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.444e+01 3.639e+01 3.754e+01 4.905e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 09:48:44,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1073786.6666666667, ans=0.125 2023-12-23 09:48:44,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1073786.6666666667, ans=0.125 2023-12-23 09:48:45,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1073786.6666666667, ans=0.0 2023-12-23 09:48:50,766 INFO [train.py:886] (3/4) Epoch 34, batch 3800, loss[loss=0.01048, audio_tagging_loss=0.01048, over 22584.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4936558.40 frames. ], batch size: 107, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:48:55,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1073853.3333333333, ans=0.1 2023-12-23 09:48:58,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1073853.3333333333, ans=0.125 2023-12-23 09:49:07,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1073920.0, ans=0.1 2023-12-23 09:49:13,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-12-23 09:49:18,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-12-23 09:49:20,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1073986.6666666667, ans=0.125 2023-12-23 09:49:25,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074053.3333333333, ans=0.1 2023-12-23 09:49:29,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1074053.3333333333, ans=0.125 2023-12-23 09:49:37,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074120.0, ans=0.1 2023-12-23 09:49:42,828 INFO [train.py:886] (3/4) Epoch 34, batch 3850, loss[loss=0.01341, audio_tagging_loss=0.01341, over 25000.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4938781.50 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:49:51,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1074253.3333333333, ans=0.125 2023-12-23 09:50:00,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1074253.3333333333, ans=0.0 2023-12-23 09:50:21,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1074453.3333333333, ans=0.05 2023-12-23 09:50:23,464 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.418e+01 3.601e+01 3.736e+01 4.189e+01, threshold=7.201e+01, percent-clipped=0.0 2023-12-23 09:50:30,848 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:50:32,666 INFO [train.py:886] (3/4) Epoch 34, batch 3900, loss[loss=0.01108, audio_tagging_loss=0.01108, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4946203.18 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:50:42,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.36 vs. limit=5.0 2023-12-23 09:50:46,889 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:51:02,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2023-12-23 09:51:03,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1074720.0, ans=0.025 2023-12-23 09:51:03,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1074720.0, ans=0.125 2023-12-23 09:51:03,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1074720.0, ans=0.125 2023-12-23 09:51:04,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1074720.0, ans=15.0 2023-12-23 09:51:05,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1074720.0, ans=0.0 2023-12-23 09:51:09,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1074720.0, ans=0.0 2023-12-23 09:51:12,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1074720.0, ans=0.125 2023-12-23 09:51:24,466 INFO [train.py:886] (3/4) Epoch 34, batch 3950, loss[loss=0.009685, audio_tagging_loss=0.009685, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4943258.25 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:51:31,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1074853.3333333333, ans=0.0 2023-12-23 09:51:41,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1074920.0, ans=0.2 2023-12-23 09:51:45,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1074986.6666666667, ans=0.2 2023-12-23 09:51:46,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=22.5 2023-12-23 09:51:47,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=22.5 2023-12-23 09:52:07,665 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.354e+01 3.526e+01 3.698e+01 4.132e+01, threshold=7.051e+01, percent-clipped=0.0 2023-12-23 09:52:09,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1075120.0, ans=0.0 2023-12-23 09:52:09,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1075120.0, ans=0.2 2023-12-23 09:52:10,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1075120.0, ans=0.125 2023-12-23 09:52:16,940 INFO [train.py:886] (3/4) Epoch 34, batch 4000, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4947577.33 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 128.0 2023-12-23 09:52:40,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-12-23 09:52:53,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=12.0 2023-12-23 09:52:59,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1075453.3333333333, ans=0.04949747468305833 2023-12-23 09:53:05,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075453.3333333333, ans=0.1 2023-12-23 09:53:08,062 INFO [train.py:886] (3/4) Epoch 34, batch 4050, loss[loss=0.01457, audio_tagging_loss=0.01457, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 4947907.54 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:53:46,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2023-12-23 09:53:51,620 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.048e+01 3.432e+01 3.586e+01 3.724e+01 5.445e+01, threshold=7.172e+01, percent-clipped=0.0 2023-12-23 09:53:59,185 INFO [train.py:886] (3/4) Epoch 34, batch 4100, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4944917.86 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:54:29,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1076053.3333333333, ans=0.0 2023-12-23 09:54:30,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1076053.3333333333, ans=0.0 2023-12-23 09:54:32,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1076053.3333333333, ans=0.0 2023-12-23 09:54:46,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-12-23 09:54:48,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-12-23 09:54:52,427 INFO [train.py:886] (3/4) Epoch 34, batch 4150, loss[loss=0.01233, audio_tagging_loss=0.01233, over 21306.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4946238.39 frames. ], batch size: 107, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:55:06,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-12-23 09:55:13,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-12-23 09:55:16,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1076320.0, ans=0.125 2023-12-23 09:55:19,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-12-23 09:55:24,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076386.6666666667, ans=0.1 2023-12-23 09:55:26,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2023-12-23 09:55:29,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1076386.6666666667, ans=0.0 2023-12-23 09:55:34,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-23 09:55:34,921 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.411e+01 3.544e+01 3.753e+01 4.282e+01, threshold=7.087e+01, percent-clipped=0.0 2023-12-23 09:55:35,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1076453.3333333333, ans=0.1 2023-12-23 09:55:35,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1076453.3333333333, ans=0.0 2023-12-23 09:55:40,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-12-23 09:55:41,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1076520.0, ans=0.0 2023-12-23 09:55:42,521 INFO [train.py:886] (3/4) Epoch 34, batch 4200, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4951376.16 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 64.0 2023-12-23 09:55:50,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-12-23 09:55:59,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1076586.6666666667, ans=0.0 2023-12-23 09:56:32,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1076786.6666666667, ans=0.0 2023-12-23 09:56:34,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1076853.3333333333, ans=0.0 2023-12-23 09:56:35,488 INFO [train.py:886] (3/4) Epoch 34, batch 4250, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4957051.88 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:56:49,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1076920.0, ans=0.0 2023-12-23 09:57:18,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.979e+01 3.377e+01 3.552e+01 3.782e+01 4.205e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 09:57:26,356 INFO [train.py:886] (3/4) Epoch 34, batch 4300, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4953175.12 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:57:45,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1077253.3333333333, ans=0.0 2023-12-23 09:57:45,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.69 vs. limit=22.5 2023-12-23 09:57:47,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1077320.0, ans=0.0 2023-12-23 09:57:52,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1077320.0, ans=0.2 2023-12-23 09:58:08,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1077453.3333333333, ans=0.04949747468305833 2023-12-23 09:58:15,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1077453.3333333333, ans=0.125 2023-12-23 09:58:17,342 INFO [train.py:886] (3/4) Epoch 34, batch 4350, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01222, audio_tagging_loss=0.01222, over 4954245.74 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:58:32,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1077586.6666666667, ans=0.125 2023-12-23 09:58:45,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1077653.3333333333, ans=0.0 2023-12-23 09:58:46,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1077653.3333333333, ans=0.0 2023-12-23 09:58:47,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1077720.0, ans=0.125 2023-12-23 09:58:58,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1077786.6666666667, ans=0.1 2023-12-23 09:59:00,728 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.247e+01 3.499e+01 3.632e+01 3.842e+01 4.825e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 09:59:09,497 INFO [train.py:886] (3/4) Epoch 34, batch 4400, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4952064.83 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 09:59:17,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1077853.3333333333, ans=0.0 2023-12-23 09:59:19,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1077920.0, ans=0.125 2023-12-23 09:59:20,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1077920.0, ans=0.0 2023-12-23 09:59:21,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2023-12-23 09:59:26,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 09:59:32,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1077986.6666666667, ans=0.125 2023-12-23 09:59:36,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-23 09:59:37,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1077986.6666666667, ans=0.125 2023-12-23 09:59:58,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1078186.6666666667, ans=0.0 2023-12-23 09:59:59,435 INFO [train.py:886] (3/4) Epoch 34, batch 4450, loss[loss=0.01165, audio_tagging_loss=0.01165, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4948361.93 frames. ], batch size: 99, lr: 3.15e-03, grad_scale: 32.0 2023-12-23 10:00:10,305 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:00:11,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-12-23 10:00:26,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1078320.0, ans=15.0 2023-12-23 10:00:33,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1078386.6666666667, ans=0.1 2023-12-23 10:00:33,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1078386.6666666667, ans=0.125 2023-12-23 10:00:45,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.445e+01 3.585e+01 3.809e+01 4.204e+01, threshold=7.171e+01, percent-clipped=0.0 2023-12-23 10:00:50,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1078453.3333333333, ans=0.125 2023-12-23 10:00:51,839 INFO [train.py:886] (3/4) Epoch 34, batch 4500, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4947069.68 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:01:23,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1078720.0, ans=0.2 2023-12-23 10:01:32,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1078786.6666666667, ans=0.0 2023-12-23 10:01:43,528 INFO [train.py:886] (3/4) Epoch 34, batch 4550, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4943248.03 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:01:50,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1078853.3333333333, ans=0.1 2023-12-23 10:01:52,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1078853.3333333333, ans=15.0 2023-12-23 10:01:54,683 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:02:18,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1079053.3333333333, ans=0.2 2023-12-23 10:02:28,640 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.071e+01 3.416e+01 3.558e+01 3.707e+01 4.537e+01, threshold=7.116e+01, percent-clipped=0.0 2023-12-23 10:02:35,205 INFO [train.py:886] (3/4) Epoch 34, batch 4600, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4945089.91 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:02:38,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1079186.6666666667, ans=0.125 2023-12-23 10:02:39,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1079186.6666666667, ans=0.07 2023-12-23 10:02:42,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1079186.6666666667, ans=0.125 2023-12-23 10:02:44,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1079253.3333333333, ans=10.0 2023-12-23 10:02:50,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=22.5 2023-12-23 10:03:02,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=15.0 2023-12-23 10:03:27,340 INFO [train.py:886] (3/4) Epoch 34, batch 4650, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4952522.75 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:03:48,065 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:04:03,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1079720.0, ans=0.125 2023-12-23 10:04:11,224 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.081e+01 3.447e+01 3.567e+01 3.799e+01 4.284e+01, threshold=7.135e+01, percent-clipped=0.0 2023-12-23 10:04:14,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1079786.6666666667, ans=0.125 2023-12-23 10:04:16,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1079786.6666666667, ans=0.125 2023-12-23 10:04:17,749 INFO [train.py:886] (3/4) Epoch 34, batch 4700, loss[loss=0.01399, audio_tagging_loss=0.01399, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4949177.70 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:04:26,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-23 10:04:34,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-12-23 10:04:50,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 10:05:05,719 INFO [train.py:886] (3/4) Epoch 34, batch 4750, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4947007.18 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2023-12-23 10:05:40,659 INFO [train.py:886] (3/4) Epoch 35, batch 0, loss[loss=0.02705, audio_tagging_loss=0.02705, over 21006.00 frames. ], tot_loss[loss=0.02705, audio_tagging_loss=0.02705, over 21006.00 frames. ], batch size: 107, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:05:40,660 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 10:06:00,367 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0637, 3.1691, 4.0416, 3.7036], device='cuda:3') 2023-12-23 10:06:02,108 INFO [train.py:917] (3/4) Epoch 35, validation: loss=0.03353, audio_tagging_loss=0.03353, over 3737520.00 frames. 2023-12-23 10:06:02,109 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 10:06:08,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2023-12-23 10:06:22,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-23 10:06:29,558 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.510e+01 3.765e+01 4.838e+01 9.519e+01, threshold=7.530e+01, percent-clipped=6.0 2023-12-23 10:06:32,646 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:06:36,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1080493.3333333333, ans=0.125 2023-12-23 10:06:52,669 INFO [train.py:886] (3/4) Epoch 35, batch 50, loss[loss=0.01584, audio_tagging_loss=0.01584, over 24866.00 frames. ], tot_loss[loss=0.01939, audio_tagging_loss=0.01939, over 1122256.14 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:07:02,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.67 vs. limit=10.0 2023-12-23 10:07:09,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1080693.3333333333, ans=0.125 2023-12-23 10:07:12,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1080760.0, ans=0.125 2023-12-23 10:07:13,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1080760.0, ans=0.0 2023-12-23 10:07:41,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-12-23 10:07:44,749 INFO [train.py:886] (3/4) Epoch 35, batch 100, loss[loss=0.01953, audio_tagging_loss=0.01953, over 25000.00 frames. ], tot_loss[loss=0.01665, audio_tagging_loss=0.01665, over 1970057.28 frames. ], batch size: 100, lr: 3.10e-03, grad_scale: 32.0 2023-12-23 10:07:49,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1080960.0, ans=0.125 2023-12-23 10:07:51,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2023-12-23 10:08:12,435 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.410e+01 3.823e+01 4.080e+01 4.340e+01 5.302e+01, threshold=8.159e+01, percent-clipped=0.0 2023-12-23 10:08:36,358 INFO [train.py:886] (3/4) Epoch 35, batch 150, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.01511, audio_tagging_loss=0.01511, over 2635798.53 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:09:05,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1081426.6666666667, ans=0.125 2023-12-23 10:09:17,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1081560.0, ans=0.0 2023-12-23 10:09:28,091 INFO [train.py:886] (3/4) Epoch 35, batch 200, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.0143, audio_tagging_loss=0.0143, over 3148085.74 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:09:37,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1081693.3333333333, ans=0.1 2023-12-23 10:09:37,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-12-23 10:09:50,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1081760.0, ans=0.1 2023-12-23 10:09:51,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1081760.0, ans=0.125 2023-12-23 10:09:53,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1081760.0, ans=0.025 2023-12-23 10:09:55,676 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.529e+01 3.676e+01 3.871e+01 4.435e+01, threshold=7.352e+01, percent-clipped=0.0 2023-12-23 10:10:20,476 INFO [train.py:886] (3/4) Epoch 35, batch 250, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 3550657.55 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:10:25,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1081960.0, ans=0.0 2023-12-23 10:10:38,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1082026.6666666667, ans=0.0 2023-12-23 10:10:44,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2023-12-23 10:10:47,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-23 10:10:51,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.03 vs. limit=10.0 2023-12-23 10:10:55,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1082160.0, ans=0.125 2023-12-23 10:10:59,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1082160.0, ans=0.0 2023-12-23 10:11:04,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1082226.6666666667, ans=0.125 2023-12-23 10:11:11,767 INFO [train.py:886] (3/4) Epoch 35, batch 300, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 3860445.37 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:11:30,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-12-23 10:11:40,255 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.473e+01 3.613e+01 3.760e+01 4.806e+01, threshold=7.226e+01, percent-clipped=0.0 2023-12-23 10:11:55,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2023-12-23 10:12:04,012 INFO [train.py:886] (3/4) Epoch 35, batch 350, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 4092554.66 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:12:49,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1082893.3333333333, ans=0.125 2023-12-23 10:12:50,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1082893.3333333333, ans=0.0 2023-12-23 10:12:57,049 INFO [train.py:886] (3/4) Epoch 35, batch 400, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01279, audio_tagging_loss=0.01279, over 4281501.53 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:13:13,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1083026.6666666667, ans=0.125 2023-12-23 10:13:18,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-23 10:13:22,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1083093.3333333333, ans=0.125 2023-12-23 10:13:24,735 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.393e+01 3.521e+01 3.659e+01 4.330e+01, threshold=7.042e+01, percent-clipped=0.0 2023-12-23 10:13:45,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-12-23 10:13:48,037 INFO [train.py:886] (3/4) Epoch 35, batch 450, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4437732.97 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:14:04,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1083360.0, ans=0.125 2023-12-23 10:14:15,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1083426.6666666667, ans=0.0 2023-12-23 10:14:36,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-23 10:14:40,532 INFO [train.py:886] (3/4) Epoch 35, batch 500, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4556720.33 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:14:59,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-12-23 10:15:09,076 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.163e+01 3.425e+01 3.572e+01 3.739e+01 4.112e+01, threshold=7.144e+01, percent-clipped=0.0 2023-12-23 10:15:14,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1083826.6666666667, ans=0.1 2023-12-23 10:15:18,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-12-23 10:15:18,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1083826.6666666667, ans=0.125 2023-12-23 10:15:22,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.46 vs. limit=10.0 2023-12-23 10:15:32,473 INFO [train.py:886] (3/4) Epoch 35, batch 550, loss[loss=0.006915, audio_tagging_loss=0.006915, over 23952.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4644726.27 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:15:32,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1083960.0, ans=0.125 2023-12-23 10:15:40,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=12.0 2023-12-23 10:15:44,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1084026.6666666667, ans=0.125 2023-12-23 10:15:57,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1084093.3333333333, ans=0.1 2023-12-23 10:16:00,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1084093.3333333333, ans=0.0 2023-12-23 10:16:02,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2023-12-23 10:16:11,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1084160.0, ans=0.125 2023-12-23 10:16:18,076 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:16:23,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1084293.3333333333, ans=0.125 2023-12-23 10:16:24,364 INFO [train.py:886] (3/4) Epoch 35, batch 600, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4711365.29 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:16:36,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=15.0 2023-12-23 10:16:37,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1084360.0, ans=0.0 2023-12-23 10:16:37,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1084360.0, ans=0.2 2023-12-23 10:16:37,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-12-23 10:16:52,608 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.983e+01 3.476e+01 3.624e+01 3.793e+01 4.486e+01, threshold=7.249e+01, percent-clipped=0.0 2023-12-23 10:17:16,825 INFO [train.py:886] (3/4) Epoch 35, batch 650, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24750.00 frames. ], tot_loss[loss=0.01242, audio_tagging_loss=0.01242, over 4758173.49 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:17:28,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1084693.3333333333, ans=0.125 2023-12-23 10:17:34,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1084693.3333333333, ans=0.09899494936611666 2023-12-23 10:17:39,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-12-23 10:17:49,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1084826.6666666667, ans=0.0 2023-12-23 10:17:59,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1084893.3333333333, ans=0.125 2023-12-23 10:18:01,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1084893.3333333333, ans=0.05 2023-12-23 10:18:06,868 INFO [train.py:886] (3/4) Epoch 35, batch 700, loss[loss=0.01504, audio_tagging_loss=0.01504, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4799023.02 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:18:13,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-12-23 10:18:19,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1085026.6666666667, ans=0.0 2023-12-23 10:18:22,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1085026.6666666667, ans=0.125 2023-12-23 10:18:31,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-12-23 10:18:33,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1085093.3333333333, ans=0.2 2023-12-23 10:18:35,075 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.034e+01 3.416e+01 3.588e+01 3.767e+01 4.947e+01, threshold=7.176e+01, percent-clipped=0.0 2023-12-23 10:18:40,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2023-12-23 10:18:46,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-12-23 10:18:57,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1085226.6666666667, ans=0.125 2023-12-23 10:18:59,686 INFO [train.py:886] (3/4) Epoch 35, batch 750, loss[loss=0.01252, audio_tagging_loss=0.01252, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4829922.87 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:10,283 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:19:12,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1085360.0, ans=0.0 2023-12-23 10:19:21,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1085426.6666666667, ans=0.125 2023-12-23 10:19:24,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1085426.6666666667, ans=0.2 2023-12-23 10:19:38,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1085493.3333333333, ans=0.07 2023-12-23 10:19:51,855 INFO [train.py:886] (3/4) Epoch 35, batch 800, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4860508.53 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:19:59,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1085626.6666666667, ans=0.2 2023-12-23 10:20:00,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1085626.6666666667, ans=0.125 2023-12-23 10:20:01,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1085693.3333333333, ans=0.2 2023-12-23 10:20:18,780 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+01 3.460e+01 3.638e+01 3.746e+01 4.332e+01, threshold=7.276e+01, percent-clipped=0.0 2023-12-23 10:20:26,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1085826.6666666667, ans=0.1 2023-12-23 10:20:35,422 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.904e-02 2023-12-23 10:20:42,795 INFO [train.py:886] (3/4) Epoch 35, batch 850, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4889511.66 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:21:28,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1086226.6666666667, ans=0.0 2023-12-23 10:21:33,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1086226.6666666667, ans=0.0 2023-12-23 10:21:35,514 INFO [train.py:886] (3/4) Epoch 35, batch 900, loss[loss=0.01295, audio_tagging_loss=0.01295, over 24750.00 frames. ], tot_loss[loss=0.0122, audio_tagging_loss=0.0122, over 4901441.00 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:22:03,187 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.210e+01 3.429e+01 3.563e+01 3.739e+01 4.144e+01, threshold=7.126e+01, percent-clipped=0.0 2023-12-23 10:22:05,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-12-23 10:22:25,576 INFO [train.py:886] (3/4) Epoch 35, batch 950, loss[loss=0.01283, audio_tagging_loss=0.01283, over 24030.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4903035.38 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:22:33,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-12-23 10:22:39,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1086693.3333333333, ans=0.1 2023-12-23 10:22:44,245 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:23:06,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-12-23 10:23:12,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1086893.3333333333, ans=0.1 2023-12-23 10:23:14,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1086893.3333333333, ans=0.125 2023-12-23 10:23:18,179 INFO [train.py:886] (3/4) Epoch 35, batch 1000, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4912197.39 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:23:19,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-12-23 10:23:29,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1087026.6666666667, ans=0.035 2023-12-23 10:23:31,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1087026.6666666667, ans=0.0 2023-12-23 10:23:35,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1087026.6666666667, ans=0.0 2023-12-23 10:23:35,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2023-12-23 10:23:46,662 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.127e+01 3.397e+01 3.527e+01 3.697e+01 4.160e+01, threshold=7.054e+01, percent-clipped=0.0 2023-12-23 10:24:10,458 INFO [train.py:886] (3/4) Epoch 35, batch 1050, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4923539.23 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:24:16,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-12-23 10:24:33,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-12-23 10:24:50,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1087560.0, ans=0.2 2023-12-23 10:24:52,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-12-23 10:25:00,973 INFO [train.py:886] (3/4) Epoch 35, batch 1100, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4930016.22 frames. ], batch size: 100, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:25:08,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1087626.6666666667, ans=0.125 2023-12-23 10:25:15,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2023-12-23 10:25:29,193 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.089e+01 3.393e+01 3.590e+01 3.785e+01 4.427e+01, threshold=7.180e+01, percent-clipped=0.0 2023-12-23 10:25:31,340 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:25:53,739 INFO [train.py:886] (3/4) Epoch 35, batch 1150, loss[loss=0.0137, audio_tagging_loss=0.0137, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4936473.67 frames. ], batch size: 99, lr: 3.09e-03, grad_scale: 32.0 2023-12-23 10:25:56,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1087960.0, ans=0.0 2023-12-23 10:26:02,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-12-23 10:26:02,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1088026.6666666667, ans=0.125 2023-12-23 10:26:02,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1088026.6666666667, ans=0.125 2023-12-23 10:26:19,253 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:26:31,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1088160.0, ans=0.125 2023-12-23 10:26:44,945 INFO [train.py:886] (3/4) Epoch 35, batch 1200, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4942820.19 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:27:02,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1088360.0, ans=0.125 2023-12-23 10:27:12,518 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.487e+01 3.620e+01 3.766e+01 4.374e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 10:27:18,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1088493.3333333333, ans=0.1 2023-12-23 10:27:21,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.71 vs. limit=15.0 2023-12-23 10:27:25,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1088493.3333333333, ans=0.125 2023-12-23 10:27:31,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.34 vs. limit=10.0 2023-12-23 10:27:36,885 INFO [train.py:886] (3/4) Epoch 35, batch 1250, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01223, audio_tagging_loss=0.01223, over 4937649.87 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:27:46,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1088693.3333333333, ans=0.0 2023-12-23 10:27:50,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1088693.3333333333, ans=0.125 2023-12-23 10:28:01,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1088760.0, ans=0.0 2023-12-23 10:28:17,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1088893.3333333333, ans=0.2 2023-12-23 10:28:29,142 INFO [train.py:886] (3/4) Epoch 35, batch 1300, loss[loss=0.0104, audio_tagging_loss=0.0104, over 24750.00 frames. ], tot_loss[loss=0.01235, audio_tagging_loss=0.01235, over 4937091.71 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:28:46,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1089026.6666666667, ans=0.2 2023-12-23 10:28:57,283 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.442e+01 3.551e+01 3.705e+01 4.359e+01, threshold=7.103e+01, percent-clipped=0.0 2023-12-23 10:29:09,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1089226.6666666667, ans=0.125 2023-12-23 10:29:09,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1089226.6666666667, ans=0.125 2023-12-23 10:29:19,899 INFO [train.py:886] (3/4) Epoch 35, batch 1350, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4939766.86 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:29:29,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1089293.3333333333, ans=0.1 2023-12-23 10:29:33,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-12-23 10:29:34,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1089360.0, ans=0.0 2023-12-23 10:29:46,986 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:29:54,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1089493.3333333333, ans=0.0 2023-12-23 10:30:01,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1089493.3333333333, ans=0.125 2023-12-23 10:30:01,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.60 vs. limit=15.0 2023-12-23 10:30:04,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-23 10:30:09,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1089560.0, ans=0.125 2023-12-23 10:30:12,248 INFO [train.py:886] (3/4) Epoch 35, batch 1400, loss[loss=0.01374, audio_tagging_loss=0.01374, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 4941988.57 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:30:16,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1089626.6666666667, ans=0.125 2023-12-23 10:30:27,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1089693.3333333333, ans=0.1 2023-12-23 10:30:38,327 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:30:39,917 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.952e+01 3.418e+01 3.570e+01 3.781e+01 4.202e+01, threshold=7.140e+01, percent-clipped=0.0 2023-12-23 10:30:42,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-12-23 10:30:55,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1089893.3333333333, ans=0.125 2023-12-23 10:31:03,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1089960.0, ans=0.125 2023-12-23 10:31:04,674 INFO [train.py:886] (3/4) Epoch 35, batch 1450, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4947711.80 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:31:09,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1089960.0, ans=0.0 2023-12-23 10:31:17,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1090026.6666666667, ans=0.1 2023-12-23 10:31:19,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1090026.6666666667, ans=0.0 2023-12-23 10:31:30,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1090093.3333333333, ans=0.07 2023-12-23 10:31:54,686 INFO [train.py:886] (3/4) Epoch 35, batch 1500, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4949615.35 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:32:10,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1090360.0, ans=0.05 2023-12-23 10:32:10,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1090360.0, ans=0.125 2023-12-23 10:32:22,401 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.067e+01 3.460e+01 3.584e+01 3.712e+01 4.259e+01, threshold=7.168e+01, percent-clipped=0.0 2023-12-23 10:32:37,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1090560.0, ans=0.2 2023-12-23 10:32:41,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=15.0 2023-12-23 10:32:46,365 INFO [train.py:886] (3/4) Epoch 35, batch 1550, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4947982.63 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:32:51,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1090626.6666666667, ans=0.125 2023-12-23 10:32:55,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090693.3333333333, ans=0.125 2023-12-23 10:32:58,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-12-23 10:32:59,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1090693.3333333333, ans=0.0 2023-12-23 10:33:04,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1090693.3333333333, ans=0.125 2023-12-23 10:33:09,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1090760.0, ans=0.125 2023-12-23 10:33:28,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1090893.3333333333, ans=0.125 2023-12-23 10:33:37,819 INFO [train.py:886] (3/4) Epoch 35, batch 1600, loss[loss=0.01226, audio_tagging_loss=0.01226, over 24750.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4946321.14 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:33:40,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090960.0, ans=0.125 2023-12-23 10:33:47,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1091026.6666666667, ans=0.0 2023-12-23 10:34:00,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1091093.3333333333, ans=0.125 2023-12-23 10:34:01,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1091093.3333333333, ans=0.125 2023-12-23 10:34:03,963 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.514e+01 3.658e+01 3.797e+01 4.440e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 10:34:05,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1091093.3333333333, ans=0.2 2023-12-23 10:34:26,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1091226.6666666667, ans=0.125 2023-12-23 10:34:27,909 INFO [train.py:886] (3/4) Epoch 35, batch 1650, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01219, audio_tagging_loss=0.01219, over 4950469.29 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:34:29,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091293.3333333333, ans=0.125 2023-12-23 10:34:48,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-12-23 10:35:10,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1091560.0, ans=0.125 2023-12-23 10:35:19,577 INFO [train.py:886] (3/4) Epoch 35, batch 1700, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4955051.43 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:35:25,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-12-23 10:35:37,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1091760.0, ans=0.0 2023-12-23 10:35:46,549 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.407e+01 3.580e+01 3.752e+01 4.487e+01, threshold=7.159e+01, percent-clipped=0.0 2023-12-23 10:36:01,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1091893.3333333333, ans=0.125 2023-12-23 10:36:04,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1091893.3333333333, ans=0.0 2023-12-23 10:36:07,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1091893.3333333333, ans=0.125 2023-12-23 10:36:09,043 INFO [train.py:886] (3/4) Epoch 35, batch 1750, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4954783.41 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:36:12,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1091960.0, ans=0.09899494936611666 2023-12-23 10:36:22,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-12-23 10:36:31,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2023-12-23 10:36:34,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2023-12-23 10:37:01,889 INFO [train.py:886] (3/4) Epoch 35, batch 1800, loss[loss=0.009888, audio_tagging_loss=0.009888, over 25000.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4961385.80 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:37:28,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1092426.6666666667, ans=0.035 2023-12-23 10:37:29,524 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.056e+01 3.488e+01 3.614e+01 3.772e+01 4.751e+01, threshold=7.228e+01, percent-clipped=0.0 2023-12-23 10:37:32,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1092493.3333333333, ans=0.125 2023-12-23 10:37:33,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1092493.3333333333, ans=0.2 2023-12-23 10:37:45,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1092560.0, ans=0.0 2023-12-23 10:37:51,363 INFO [train.py:886] (3/4) Epoch 35, batch 1850, loss[loss=0.01501, audio_tagging_loss=0.01501, over 24750.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4961244.51 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:09,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-23 10:38:15,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1092760.0, ans=0.125 2023-12-23 10:38:26,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1092826.6666666667, ans=0.0 2023-12-23 10:38:42,689 INFO [train.py:886] (3/4) Epoch 35, batch 1900, loss[loss=0.01324, audio_tagging_loss=0.01324, over 24750.00 frames. ], tot_loss[loss=0.0123, audio_tagging_loss=0.0123, over 4952283.92 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:38:56,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1093026.6666666667, ans=0.125 2023-12-23 10:39:03,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1093093.3333333333, ans=0.125 2023-12-23 10:39:10,604 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.113e+01 3.449e+01 3.631e+01 3.772e+01 4.886e+01, threshold=7.262e+01, percent-clipped=0.0 2023-12-23 10:39:22,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1093226.6666666667, ans=0.0 2023-12-23 10:39:26,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1093226.6666666667, ans=0.1 2023-12-23 10:39:35,198 INFO [train.py:886] (3/4) Epoch 35, batch 1950, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.01224, audio_tagging_loss=0.01224, over 4948303.52 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:39:47,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1093360.0, ans=0.125 2023-12-23 10:40:03,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1093426.6666666667, ans=0.125 2023-12-23 10:40:12,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1093493.3333333333, ans=0.1 2023-12-23 10:40:21,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093560.0, ans=0.1 2023-12-23 10:40:22,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1093560.0, ans=0.0 2023-12-23 10:40:25,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1093560.0, ans=0.125 2023-12-23 10:40:27,283 INFO [train.py:886] (3/4) Epoch 35, batch 2000, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4945409.80 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:40:36,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1093626.6666666667, ans=0.0 2023-12-23 10:40:38,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1093693.3333333333, ans=0.125 2023-12-23 10:40:47,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-12-23 10:40:51,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-12-23 10:40:55,723 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.152e+01 3.397e+01 3.574e+01 3.710e+01 4.548e+01, threshold=7.148e+01, percent-clipped=0.0 2023-12-23 10:41:11,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-12-23 10:41:20,399 INFO [train.py:886] (3/4) Epoch 35, batch 2050, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4950466.11 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:41:36,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1094026.6666666667, ans=0.1 2023-12-23 10:42:08,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=12.0 2023-12-23 10:42:10,504 INFO [train.py:886] (3/4) Epoch 35, batch 2100, loss[loss=0.0101, audio_tagging_loss=0.0101, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4953090.47 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:42:30,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1094426.6666666667, ans=0.2 2023-12-23 10:42:37,893 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.441e+01 3.605e+01 3.842e+01 4.378e+01, threshold=7.210e+01, percent-clipped=0.0 2023-12-23 10:42:57,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1094560.0, ans=0.0 2023-12-23 10:42:57,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1094560.0, ans=0.125 2023-12-23 10:43:02,011 INFO [train.py:886] (3/4) Epoch 35, batch 2150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4956860.50 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 64.0 2023-12-23 10:43:02,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1094626.6666666667, ans=0.1 2023-12-23 10:43:12,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1094693.3333333333, ans=0.0 2023-12-23 10:43:53,307 INFO [train.py:886] (3/4) Epoch 35, batch 2200, loss[loss=0.01425, audio_tagging_loss=0.01425, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4948254.47 frames. ], batch size: 99, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:44:09,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1095026.6666666667, ans=0.1 2023-12-23 10:44:22,586 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.461e+01 3.667e+01 3.781e+01 4.293e+01, threshold=7.335e+01, percent-clipped=0.0 2023-12-23 10:44:42,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1095226.6666666667, ans=0.09899494936611666 2023-12-23 10:44:44,094 INFO [train.py:886] (3/4) Epoch 35, batch 2250, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4944633.98 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 32.0 2023-12-23 10:44:46,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1095293.3333333333, ans=0.0 2023-12-23 10:44:49,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-12-23 10:45:21,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1095493.3333333333, ans=0.0 2023-12-23 10:45:31,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1095560.0, ans=0.0 2023-12-23 10:45:32,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1095560.0, ans=0.0 2023-12-23 10:45:35,335 INFO [train.py:886] (3/4) Epoch 35, batch 2300, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4950438.38 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:45:35,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1095626.6666666667, ans=0.0 2023-12-23 10:45:46,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1095693.3333333333, ans=0.0 2023-12-23 10:46:03,976 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.037e+01 3.412e+01 3.576e+01 3.677e+01 4.204e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 10:46:08,662 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:46:27,857 INFO [train.py:886] (3/4) Epoch 35, batch 2350, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4956353.95 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:46:41,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1096026.6666666667, ans=0.1 2023-12-23 10:46:42,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.48 vs. limit=10.0 2023-12-23 10:46:49,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1096093.3333333333, ans=0.95 2023-12-23 10:47:19,148 INFO [train.py:886] (3/4) Epoch 35, batch 2400, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4957708.18 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:47:36,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1096360.0, ans=0.125 2023-12-23 10:47:47,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1096426.6666666667, ans=0.125 2023-12-23 10:47:48,608 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.451e+01 3.579e+01 3.689e+01 4.162e+01, threshold=7.158e+01, percent-clipped=0.0 2023-12-23 10:48:02,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1096560.0, ans=0.125 2023-12-23 10:48:10,844 INFO [train.py:886] (3/4) Epoch 35, batch 2450, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4958511.38 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:48:11,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:48:13,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1096626.6666666667, ans=0.125 2023-12-23 10:48:28,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1096693.3333333333, ans=0.125 2023-12-23 10:48:36,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1096760.0, ans=0.125 2023-12-23 10:48:41,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1096826.6666666667, ans=0.2 2023-12-23 10:48:43,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1096826.6666666667, ans=0.0 2023-12-23 10:48:51,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1096893.3333333333, ans=0.125 2023-12-23 10:49:01,551 INFO [train.py:886] (3/4) Epoch 35, batch 2500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4947902.81 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:49:13,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-12-23 10:49:30,653 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.928e+01 3.456e+01 3.603e+01 3.843e+01 4.868e+01, threshold=7.207e+01, percent-clipped=0.0 2023-12-23 10:49:40,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1097160.0, ans=0.0 2023-12-23 10:49:44,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1097226.6666666667, ans=0.125 2023-12-23 10:49:52,982 INFO [train.py:886] (3/4) Epoch 35, batch 2550, loss[loss=0.01291, audio_tagging_loss=0.01291, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4944996.37 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:49:57,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1097293.3333333333, ans=0.1 2023-12-23 10:50:10,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1097360.0, ans=0.125 2023-12-23 10:50:46,618 INFO [train.py:886] (3/4) Epoch 35, batch 2600, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4941906.25 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:50:49,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1097626.6666666667, ans=0.125 2023-12-23 10:50:55,372 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.502e-03 2023-12-23 10:51:15,873 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.061e+01 3.460e+01 3.619e+01 3.732e+01 4.232e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 10:51:18,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1097826.6666666667, ans=0.09899494936611666 2023-12-23 10:51:19,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1097826.6666666667, ans=0.1 2023-12-23 10:51:27,524 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:51:32,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1097893.3333333333, ans=0.125 2023-12-23 10:51:37,641 INFO [train.py:886] (3/4) Epoch 35, batch 2650, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4945755.85 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:51:39,437 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 10:51:50,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1098026.6666666667, ans=0.125 2023-12-23 10:51:59,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1098093.3333333333, ans=0.0 2023-12-23 10:52:01,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1098093.3333333333, ans=0.2 2023-12-23 10:52:18,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1098160.0, ans=0.2 2023-12-23 10:52:26,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098226.6666666667, ans=0.1 2023-12-23 10:52:29,776 INFO [train.py:886] (3/4) Epoch 35, batch 2700, loss[loss=0.009465, audio_tagging_loss=0.009465, over 25000.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4945444.01 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:52:43,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.10 vs. limit=22.5 2023-12-23 10:52:59,080 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.447e+01 3.571e+01 3.720e+01 4.339e+01, threshold=7.142e+01, percent-clipped=0.0 2023-12-23 10:53:04,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-12-23 10:53:21,970 INFO [train.py:886] (3/4) Epoch 35, batch 2750, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4952188.15 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:53:25,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1098626.6666666667, ans=0.125 2023-12-23 10:53:39,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1098693.3333333333, ans=0.125 2023-12-23 10:53:43,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-12-23 10:53:46,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1098760.0, ans=0.125 2023-12-23 10:54:04,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1098893.3333333333, ans=0.0 2023-12-23 10:54:06,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1098893.3333333333, ans=0.5 2023-12-23 10:54:06,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1098893.3333333333, ans=0.125 2023-12-23 10:54:11,632 INFO [train.py:886] (3/4) Epoch 35, batch 2800, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4952654.63 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:54:19,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1098960.0, ans=0.1 2023-12-23 10:54:31,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1099026.6666666667, ans=0.1 2023-12-23 10:54:41,033 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.165e+01 3.455e+01 3.628e+01 3.845e+01 4.485e+01, threshold=7.256e+01, percent-clipped=0.0 2023-12-23 10:54:44,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=15.0 2023-12-23 10:55:04,711 INFO [train.py:886] (3/4) Epoch 35, batch 2850, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4945504.19 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:55:18,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1099360.0, ans=0.0 2023-12-23 10:55:30,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-12-23 10:55:33,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1099426.6666666667, ans=0.125 2023-12-23 10:55:40,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1099493.3333333333, ans=0.0 2023-12-23 10:55:57,044 INFO [train.py:886] (3/4) Epoch 35, batch 2900, loss[loss=0.01275, audio_tagging_loss=0.01275, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4935981.41 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:56:00,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1099626.6666666667, ans=0.125 2023-12-23 10:56:01,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1099626.6666666667, ans=0.2 2023-12-23 10:56:15,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1099693.3333333333, ans=0.125 2023-12-23 10:56:21,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.26 vs. limit=10.0 2023-12-23 10:56:24,421 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.411e+01 3.569e+01 3.817e+01 4.301e+01, threshold=7.139e+01, percent-clipped=0.0 2023-12-23 10:56:37,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1099893.3333333333, ans=0.125 2023-12-23 10:56:44,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1099893.3333333333, ans=0.09899494936611666 2023-12-23 10:56:48,117 INFO [train.py:886] (3/4) Epoch 35, batch 2950, loss[loss=0.012, audio_tagging_loss=0.012, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4937426.95 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:56:49,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1099960.0, ans=0.0 2023-12-23 10:56:51,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1099960.0, ans=0.2 2023-12-23 10:57:01,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1100026.6666666667, ans=0.125 2023-12-23 10:57:10,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1100093.3333333333, ans=0.05 2023-12-23 10:57:27,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1100160.0, ans=0.125 2023-12-23 10:57:33,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2023-12-23 10:57:41,378 INFO [train.py:886] (3/4) Epoch 35, batch 3000, loss[loss=0.009347, audio_tagging_loss=0.009347, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4940290.24 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:57:41,379 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 10:57:52,362 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6330, 2.9776, 4.1996, 3.8310], device='cuda:3') 2023-12-23 10:58:02,710 INFO [train.py:917] (3/4) Epoch 35, validation: loss=0.03345, audio_tagging_loss=0.03345, over 3737520.00 frames. 2023-12-23 10:58:02,711 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 10:58:02,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1100293.3333333333, ans=0.0 2023-12-23 10:58:30,593 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.128e+01 3.436e+01 3.631e+01 3.835e+01 4.770e+01, threshold=7.261e+01, percent-clipped=0.0 2023-12-23 10:58:40,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-12-23 10:58:54,472 INFO [train.py:886] (3/4) Epoch 35, batch 3050, loss[loss=0.01021, audio_tagging_loss=0.01021, over 21140.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4942521.79 frames. ], batch size: 107, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:58:56,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1100626.6666666667, ans=0.125 2023-12-23 10:59:08,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1100693.3333333333, ans=0.2 2023-12-23 10:59:45,936 INFO [train.py:886] (3/4) Epoch 35, batch 3100, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4948057.72 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 10:59:57,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-12-23 11:00:03,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1101026.6666666667, ans=0.1 2023-12-23 11:00:06,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1101093.3333333333, ans=0.125 2023-12-23 11:00:15,064 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.536e+01 3.677e+01 3.842e+01 4.191e+01, threshold=7.354e+01, percent-clipped=0.0 2023-12-23 11:00:18,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1101160.0, ans=0.1 2023-12-23 11:00:23,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-12-23 11:00:36,557 INFO [train.py:886] (3/4) Epoch 35, batch 3150, loss[loss=0.01489, audio_tagging_loss=0.01489, over 22314.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4943368.16 frames. ], batch size: 107, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:00:39,608 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:00:47,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1101360.0, ans=0.1 2023-12-23 11:00:47,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2023-12-23 11:00:49,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1101360.0, ans=0.0 2023-12-23 11:00:54,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1101360.0, ans=0.0 2023-12-23 11:00:56,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1101360.0, ans=0.125 2023-12-23 11:01:09,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1101493.3333333333, ans=0.0 2023-12-23 11:01:09,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1101493.3333333333, ans=0.125 2023-12-23 11:01:13,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2023-12-23 11:01:25,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-12-23 11:01:28,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1101626.6666666667, ans=0.125 2023-12-23 11:01:28,645 INFO [train.py:886] (3/4) Epoch 35, batch 3200, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4940654.89 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:01:35,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1101626.6666666667, ans=0.0 2023-12-23 11:01:49,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1101760.0, ans=0.125 2023-12-23 11:01:55,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1101760.0, ans=0.1 2023-12-23 11:01:57,268 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.006e+01 3.428e+01 3.617e+01 3.805e+01 4.182e+01, threshold=7.234e+01, percent-clipped=0.0 2023-12-23 11:02:11,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1101893.3333333333, ans=0.125 2023-12-23 11:02:16,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1101893.3333333333, ans=0.1 2023-12-23 11:02:19,488 INFO [train.py:886] (3/4) Epoch 35, batch 3250, loss[loss=0.01254, audio_tagging_loss=0.01254, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4944224.94 frames. ], batch size: 100, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:02:22,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-12-23 11:02:40,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-12-23 11:02:51,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1102160.0, ans=0.125 2023-12-23 11:02:55,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-23 11:02:56,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1102160.0, ans=0.1 2023-12-23 11:03:03,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-12-23 11:03:09,891 INFO [train.py:886] (3/4) Epoch 35, batch 3300, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4950780.37 frames. ], batch size: 99, lr: 3.07e-03, grad_scale: 32.0 2023-12-23 11:03:16,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1102293.3333333333, ans=0.0 2023-12-23 11:03:36,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1102426.6666666667, ans=0.04949747468305833 2023-12-23 11:03:39,474 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.083e+01 3.432e+01 3.586e+01 3.721e+01 4.248e+01, threshold=7.173e+01, percent-clipped=0.0 2023-12-23 11:03:45,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102493.3333333333, ans=0.1 2023-12-23 11:03:50,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1102560.0, ans=0.0 2023-12-23 11:04:02,290 INFO [train.py:886] (3/4) Epoch 35, batch 3350, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4956177.14 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:04:04,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1102626.6666666667, ans=0.0 2023-12-23 11:04:04,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1102626.6666666667, ans=0.125 2023-12-23 11:04:16,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-12-23 11:04:21,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102760.0, ans=0.1 2023-12-23 11:04:26,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1102760.0, ans=0.07 2023-12-23 11:04:36,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1102826.6666666667, ans=0.125 2023-12-23 11:04:39,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1102826.6666666667, ans=0.5 2023-12-23 11:04:49,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=15.0 2023-12-23 11:04:52,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=12.0 2023-12-23 11:04:52,943 INFO [train.py:886] (3/4) Epoch 35, batch 3400, loss[loss=0.01401, audio_tagging_loss=0.01401, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4959073.38 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:04:53,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1102960.0, ans=0.125 2023-12-23 11:05:07,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1103026.6666666667, ans=0.0 2023-12-23 11:05:11,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1103026.6666666667, ans=0.0 2023-12-23 11:05:22,234 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.114e+01 3.482e+01 3.648e+01 3.813e+01 4.164e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 11:05:39,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1103226.6666666667, ans=10.0 2023-12-23 11:05:45,973 INFO [train.py:886] (3/4) Epoch 35, batch 3450, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4952318.72 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:05:48,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-23 11:05:51,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1103293.3333333333, ans=0.125 2023-12-23 11:06:16,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1103493.3333333333, ans=0.125 2023-12-23 11:06:17,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1103493.3333333333, ans=0.125 2023-12-23 11:06:35,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1103560.0, ans=0.07 2023-12-23 11:06:36,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1103560.0, ans=0.1 2023-12-23 11:06:38,336 INFO [train.py:886] (3/4) Epoch 35, batch 3500, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4950587.44 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:06:42,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1103626.6666666667, ans=0.1 2023-12-23 11:07:04,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1103760.0, ans=0.125 2023-12-23 11:07:07,533 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.194e+01 3.520e+01 3.666e+01 3.849e+01 4.626e+01, threshold=7.332e+01, percent-clipped=0.0 2023-12-23 11:07:08,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1103826.6666666667, ans=0.125 2023-12-23 11:07:12,536 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:07:12,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-12-23 11:07:24,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1103893.3333333333, ans=0.125 2023-12-23 11:07:25,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1103893.3333333333, ans=0.125 2023-12-23 11:07:29,126 INFO [train.py:886] (3/4) Epoch 35, batch 3550, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4948995.20 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:07:53,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1104093.3333333333, ans=0.2 2023-12-23 11:08:16,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1104226.6666666667, ans=0.125 2023-12-23 11:08:17,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1104226.6666666667, ans=0.125 2023-12-23 11:08:21,771 INFO [train.py:886] (3/4) Epoch 35, batch 3600, loss[loss=0.00734, audio_tagging_loss=0.00734, over 24020.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4946603.87 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:08:29,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1104293.3333333333, ans=0.0 2023-12-23 11:08:47,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1104426.6666666667, ans=0.125 2023-12-23 11:08:51,251 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.469e+01 3.628e+01 3.807e+01 4.642e+01, threshold=7.257e+01, percent-clipped=0.0 2023-12-23 11:09:11,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1104560.0, ans=0.0 2023-12-23 11:09:14,212 INFO [train.py:886] (3/4) Epoch 35, batch 3650, loss[loss=0.01267, audio_tagging_loss=0.01267, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4949457.90 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:09:15,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1104626.6666666667, ans=0.125 2023-12-23 11:09:20,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-12-23 11:09:25,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-12-23 11:09:36,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-12-23 11:10:02,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.05 vs. limit=10.0 2023-12-23 11:10:05,074 INFO [train.py:886] (3/4) Epoch 35, batch 3700, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4954019.06 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:10:16,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105026.6666666667, ans=0.125 2023-12-23 11:10:29,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1105093.3333333333, ans=0.0 2023-12-23 11:10:34,231 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.497e+01 3.613e+01 3.767e+01 4.191e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:10:37,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1105160.0, ans=0.025 2023-12-23 11:10:37,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1105160.0, ans=0.0 2023-12-23 11:10:57,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1105293.3333333333, ans=0.125 2023-12-23 11:10:58,109 INFO [train.py:886] (3/4) Epoch 35, batch 3750, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24042.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4942596.87 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:11:02,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1105293.3333333333, ans=0.0 2023-12-23 11:11:15,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1105360.0, ans=0.0 2023-12-23 11:11:35,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1105493.3333333333, ans=0.125 2023-12-23 11:11:49,008 INFO [train.py:886] (3/4) Epoch 35, batch 3800, loss[loss=0.01169, audio_tagging_loss=0.01169, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4938698.27 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:11:54,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1105626.6666666667, ans=0.0 2023-12-23 11:12:11,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-12-23 11:12:17,673 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.190e+01 3.510e+01 3.632e+01 3.780e+01 4.294e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:12:41,328 INFO [train.py:886] (3/4) Epoch 35, batch 3850, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4937236.27 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:12:44,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1105960.0, ans=0.125 2023-12-23 11:12:54,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1106026.6666666667, ans=0.125 2023-12-23 11:13:04,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106093.3333333333, ans=0.125 2023-12-23 11:13:11,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1106160.0, ans=0.125 2023-12-23 11:13:15,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106160.0, ans=0.1 2023-12-23 11:13:16,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106160.0, ans=0.125 2023-12-23 11:13:20,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1106226.6666666667, ans=0.5 2023-12-23 11:13:33,055 INFO [train.py:886] (3/4) Epoch 35, batch 3900, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4938533.95 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:13:48,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-12-23 11:13:49,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1106360.0, ans=0.0 2023-12-23 11:13:59,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-12-23 11:14:00,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-12-23 11:14:01,022 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.458e+01 3.613e+01 3.736e+01 4.379e+01, threshold=7.225e+01, percent-clipped=0.0 2023-12-23 11:14:03,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1106493.3333333333, ans=0.2 2023-12-23 11:14:05,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1106493.3333333333, ans=0.125 2023-12-23 11:14:15,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106560.0, ans=0.1 2023-12-23 11:14:18,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1106560.0, ans=0.0 2023-12-23 11:14:22,786 INFO [train.py:886] (3/4) Epoch 35, batch 3950, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4942177.87 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:14:29,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-12-23 11:14:31,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-12-23 11:14:35,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106693.3333333333, ans=0.1 2023-12-23 11:14:36,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.13 vs. limit=22.5 2023-12-23 11:14:40,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1106693.3333333333, ans=0.2 2023-12-23 11:14:51,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1106760.0, ans=0.0 2023-12-23 11:14:56,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1106826.6666666667, ans=0.125 2023-12-23 11:15:04,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1106893.3333333333, ans=0.125 2023-12-23 11:15:14,814 INFO [train.py:886] (3/4) Epoch 35, batch 4000, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4943420.24 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:15:26,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2023-12-23 11:15:42,972 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.176e+01 3.470e+01 3.615e+01 3.743e+01 4.164e+01, threshold=7.230e+01, percent-clipped=0.0 2023-12-23 11:15:47,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1107160.0, ans=0.125 2023-12-23 11:15:47,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1107160.0, ans=0.0 2023-12-23 11:15:50,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2023-12-23 11:15:51,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107160.0, ans=0.1 2023-12-23 11:15:57,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1107226.6666666667, ans=0.0 2023-12-23 11:16:00,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1107226.6666666667, ans=0.0 2023-12-23 11:16:03,888 INFO [train.py:886] (3/4) Epoch 35, batch 4050, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4946907.69 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:16:14,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1107360.0, ans=0.0 2023-12-23 11:16:23,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1107426.6666666667, ans=0.125 2023-12-23 11:16:42,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1107493.3333333333, ans=0.125 2023-12-23 11:16:53,985 INFO [train.py:886] (3/4) Epoch 35, batch 4100, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4947434.42 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:16:56,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1107626.6666666667, ans=0.07 2023-12-23 11:16:56,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1107626.6666666667, ans=0.05 2023-12-23 11:16:57,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1107626.6666666667, ans=0.125 2023-12-23 11:17:08,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1107693.3333333333, ans=0.2 2023-12-23 11:17:10,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-12-23 11:17:13,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1107693.3333333333, ans=0.1 2023-12-23 11:17:19,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-12-23 11:17:23,020 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.082e+01 3.446e+01 3.615e+01 3.831e+01 4.582e+01, threshold=7.231e+01, percent-clipped=0.0 2023-12-23 11:17:25,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1107826.6666666667, ans=0.125 2023-12-23 11:17:25,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1107826.6666666667, ans=0.125 2023-12-23 11:17:30,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:17:36,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-12-23 11:17:46,696 INFO [train.py:886] (3/4) Epoch 35, batch 4150, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4940322.87 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2023-12-23 11:18:01,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2023-12-23 11:18:26,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-12-23 11:18:31,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1108226.6666666667, ans=0.125 2023-12-23 11:18:31,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2023-12-23 11:18:36,275 INFO [train.py:886] (3/4) Epoch 35, batch 4200, loss[loss=0.01176, audio_tagging_loss=0.01176, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4942491.48 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:18:42,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-12-23 11:18:43,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1108293.3333333333, ans=0.2 2023-12-23 11:18:44,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-12-23 11:19:04,620 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.094e+01 3.379e+01 3.552e+01 3.711e+01 4.184e+01, threshold=7.105e+01, percent-clipped=0.0 2023-12-23 11:19:05,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2023-12-23 11:19:27,413 INFO [train.py:886] (3/4) Epoch 35, batch 4250, loss[loss=0.00992, audio_tagging_loss=0.00992, over 24020.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4946985.80 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:19:29,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1108626.6666666667, ans=0.0 2023-12-23 11:19:52,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1108760.0, ans=0.125 2023-12-23 11:20:02,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1108826.6666666667, ans=0.0 2023-12-23 11:20:07,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1108893.3333333333, ans=0.1 2023-12-23 11:20:13,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-23 11:20:18,353 INFO [train.py:886] (3/4) Epoch 35, batch 4300, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4952636.72 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:20:20,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1108960.0, ans=0.125 2023-12-23 11:20:30,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1109026.6666666667, ans=0.09899494936611666 2023-12-23 11:20:39,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:20:43,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2023-12-23 11:20:46,997 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.455e+01 3.593e+01 3.734e+01 4.513e+01, threshold=7.186e+01, percent-clipped=0.0 2023-12-23 11:21:10,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109293.3333333333, ans=0.1 2023-12-23 11:21:10,811 INFO [train.py:886] (3/4) Epoch 35, batch 4350, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4957469.77 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:21:15,975 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:21:22,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-12-23 11:21:27,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1109360.0, ans=0.125 2023-12-23 11:21:49,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 11:22:03,370 INFO [train.py:886] (3/4) Epoch 35, batch 4400, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4952788.98 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 64.0 2023-12-23 11:22:04,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109626.6666666667, ans=0.1 2023-12-23 11:22:13,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-12-23 11:22:32,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-12-23 11:22:32,612 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.551e+01 3.706e+01 3.864e+01 4.550e+01, threshold=7.411e+01, percent-clipped=0.0 2023-12-23 11:22:46,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2023-12-23 11:22:48,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1109893.3333333333, ans=0.0 2023-12-23 11:22:48,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1109893.3333333333, ans=0.125 2023-12-23 11:22:54,181 INFO [train.py:886] (3/4) Epoch 35, batch 4450, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.01216, audio_tagging_loss=0.01216, over 4946915.11 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:23:04,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1109960.0, ans=0.125 2023-12-23 11:23:23,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1110093.3333333333, ans=0.04949747468305833 2023-12-23 11:23:34,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1110160.0, ans=0.125 2023-12-23 11:23:40,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1110226.6666666667, ans=0.125 2023-12-23 11:23:44,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1110226.6666666667, ans=0.125 2023-12-23 11:23:47,261 INFO [train.py:886] (3/4) Epoch 35, batch 4500, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4954419.45 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:23:48,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1110293.3333333333, ans=0.025 2023-12-23 11:24:10,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110426.6666666667, ans=0.1 2023-12-23 11:24:13,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1110426.6666666667, ans=0.125 2023-12-23 11:24:16,594 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.136e+01 3.438e+01 3.620e+01 3.863e+01 4.550e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:24:26,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-12-23 11:24:36,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-12-23 11:24:39,139 INFO [train.py:886] (3/4) Epoch 35, batch 4550, loss[loss=0.01052, audio_tagging_loss=0.01052, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4956530.39 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:24:45,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1110626.6666666667, ans=0.025 2023-12-23 11:24:49,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1110693.3333333333, ans=0.1 2023-12-23 11:25:01,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2023-12-23 11:25:23,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2023-12-23 11:25:29,950 INFO [train.py:886] (3/4) Epoch 35, batch 4600, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24032.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4957238.29 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:25:59,320 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.428e+01 3.565e+01 3.717e+01 4.348e+01, threshold=7.130e+01, percent-clipped=0.0 2023-12-23 11:25:59,555 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:26:07,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1111160.0, ans=0.125 2023-12-23 11:26:08,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-12-23 11:26:09,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-23 11:26:22,425 INFO [train.py:886] (3/4) Epoch 35, batch 4650, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4957977.42 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:26:22,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2023-12-23 11:26:33,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1111360.0, ans=0.125 2023-12-23 11:26:45,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1111426.6666666667, ans=0.125 2023-12-23 11:27:07,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=12.0 2023-12-23 11:27:13,083 INFO [train.py:886] (3/4) Epoch 35, batch 4700, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4949126.70 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:27:20,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1111626.6666666667, ans=0.125 2023-12-23 11:27:20,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-12-23 11:27:28,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-12-23 11:27:29,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1111693.3333333333, ans=0.025 2023-12-23 11:27:35,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1111760.0, ans=0.09899494936611666 2023-12-23 11:27:39,775 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.493e+01 3.654e+01 3.823e+01 4.545e+01, threshold=7.308e+01, percent-clipped=0.0 2023-12-23 11:27:50,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1111893.3333333333, ans=0.125 2023-12-23 11:28:00,257 INFO [train.py:886] (3/4) Epoch 35, batch 4750, loss[loss=0.01378, audio_tagging_loss=0.01378, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4945891.99 frames. ], batch size: 99, lr: 3.05e-03, grad_scale: 64.0 2023-12-23 11:28:06,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-12-23 11:28:35,990 INFO [train.py:886] (3/4) Epoch 36, batch 0, loss[loss=0.0239, audio_tagging_loss=0.0239, over 25000.00 frames. ], tot_loss[loss=0.0239, audio_tagging_loss=0.0239, over 25000.00 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:28:35,991 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 11:28:52,825 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1699, 0.8930, 4.4375, 4.3205], device='cuda:3') 2023-12-23 11:28:56,820 INFO [train.py:917] (3/4) Epoch 36, validation: loss=0.0339, audio_tagging_loss=0.0339, over 3737520.00 frames. 2023-12-23 11:28:56,820 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 11:28:57,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2023-12-23 11:29:01,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1112066.6666666667, ans=0.0 2023-12-23 11:29:08,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1112133.3333333333, ans=0.1 2023-12-23 11:29:11,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1112133.3333333333, ans=0.125 2023-12-23 11:29:12,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1112133.3333333333, ans=0.125 2023-12-23 11:29:18,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1112200.0, ans=0.1 2023-12-23 11:29:48,224 INFO [train.py:886] (3/4) Epoch 36, batch 50, loss[loss=0.01448, audio_tagging_loss=0.01448, over 25000.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 1110081.50 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:29:54,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-12-23 11:30:01,871 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.062e+01 3.818e+01 4.375e+01 4.992e+01 9.452e+01, threshold=8.751e+01, percent-clipped=8.0 2023-12-23 11:30:03,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1112466.6666666667, ans=0.125 2023-12-23 11:30:13,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1112533.3333333333, ans=0.05 2023-12-23 11:30:16,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1112533.3333333333, ans=0.125 2023-12-23 11:30:26,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2023-12-23 11:30:32,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1112666.6666666667, ans=0.125 2023-12-23 11:30:33,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1112666.6666666667, ans=0.1 2023-12-23 11:30:40,125 INFO [train.py:886] (3/4) Epoch 36, batch 100, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01661, audio_tagging_loss=0.01661, over 1968111.83 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:30:58,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-12-23 11:31:08,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1112866.6666666667, ans=0.125 2023-12-23 11:31:22,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1113000.0, ans=0.0 2023-12-23 11:31:29,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1113000.0, ans=0.1 2023-12-23 11:31:31,101 INFO [train.py:886] (3/4) Epoch 36, batch 150, loss[loss=0.01479, audio_tagging_loss=0.01479, over 25000.00 frames. ], tot_loss[loss=0.01516, audio_tagging_loss=0.01516, over 2631916.51 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:31:40,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1113133.3333333333, ans=0.125 2023-12-23 11:31:40,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1113133.3333333333, ans=0.0 2023-12-23 11:31:41,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1113133.3333333333, ans=0.07 2023-12-23 11:31:43,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.213e+01 3.709e+01 3.865e+01 4.019e+01 4.619e+01, threshold=7.729e+01, percent-clipped=0.0 2023-12-23 11:31:44,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1113133.3333333333, ans=0.125 2023-12-23 11:32:09,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2023-12-23 11:32:10,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-12-23 11:32:22,773 INFO [train.py:886] (3/4) Epoch 36, batch 200, loss[loss=0.0149, audio_tagging_loss=0.0149, over 25000.00 frames. ], tot_loss[loss=0.01437, audio_tagging_loss=0.01437, over 3141934.28 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:32:24,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1113400.0, ans=0.1 2023-12-23 11:32:56,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1113600.0, ans=0.2 2023-12-23 11:33:02,494 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:33:15,319 INFO [train.py:886] (3/4) Epoch 36, batch 250, loss[loss=0.009573, audio_tagging_loss=0.009573, over 25000.00 frames. ], tot_loss[loss=0.01386, audio_tagging_loss=0.01386, over 3546282.04 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:33:28,142 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.045e+01 3.532e+01 3.658e+01 3.837e+01 4.468e+01, threshold=7.316e+01, percent-clipped=0.0 2023-12-23 11:33:39,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1113866.6666666667, ans=0.09899494936611666 2023-12-23 11:34:06,866 INFO [train.py:886] (3/4) Epoch 36, batch 300, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.0135, audio_tagging_loss=0.0135, over 3860351.46 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:34:17,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1114133.3333333333, ans=0.0 2023-12-23 11:34:39,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1114266.6666666667, ans=0.125 2023-12-23 11:34:47,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1114333.3333333333, ans=0.125 2023-12-23 11:34:51,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2023-12-23 11:34:58,112 INFO [train.py:886] (3/4) Epoch 36, batch 350, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01323, audio_tagging_loss=0.01323, over 4096089.41 frames. ], batch size: 100, lr: 3.01e-03, grad_scale: 32.0 2023-12-23 11:35:12,702 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.147e+01 3.467e+01 3.650e+01 3.765e+01 4.145e+01, threshold=7.301e+01, percent-clipped=0.0 2023-12-23 11:35:12,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1114466.6666666667, ans=0.0 2023-12-23 11:35:29,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114600.0, ans=0.1 2023-12-23 11:35:44,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1114666.6666666667, ans=0.125 2023-12-23 11:35:51,304 INFO [train.py:886] (3/4) Epoch 36, batch 400, loss[loss=0.01297, audio_tagging_loss=0.01297, over 25000.00 frames. ], tot_loss[loss=0.01292, audio_tagging_loss=0.01292, over 4286285.04 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:36:00,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1114800.0, ans=0.0 2023-12-23 11:36:02,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-12-23 11:36:13,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1114866.6666666667, ans=0.0 2023-12-23 11:36:25,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1114933.3333333333, ans=0.125 2023-12-23 11:36:28,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1114933.3333333333, ans=0.2 2023-12-23 11:36:34,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1115000.0, ans=0.0 2023-12-23 11:36:42,385 INFO [train.py:886] (3/4) Epoch 36, batch 450, loss[loss=0.0103, audio_tagging_loss=0.0103, over 25000.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4428759.73 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:36:56,756 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.749e+01 3.426e+01 3.574e+01 3.756e+01 4.682e+01, threshold=7.147e+01, percent-clipped=0.0 2023-12-23 11:37:02,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-12-23 11:37:09,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115200.0, ans=0.1 2023-12-23 11:37:09,303 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:37:11,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1115200.0, ans=0.2 2023-12-23 11:37:16,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1115266.6666666667, ans=0.5 2023-12-23 11:37:19,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1115266.6666666667, ans=0.125 2023-12-23 11:37:19,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1115266.6666666667, ans=0.0 2023-12-23 11:37:19,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-12-23 11:37:28,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1115333.3333333333, ans=0.0 2023-12-23 11:37:29,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1115333.3333333333, ans=0.125 2023-12-23 11:37:34,599 INFO [train.py:886] (3/4) Epoch 36, batch 500, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01237, audio_tagging_loss=0.01237, over 4545630.76 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:37:40,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2023-12-23 11:37:46,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-12-23 11:37:54,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1115466.6666666667, ans=0.1 2023-12-23 11:37:58,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115533.3333333333, ans=0.1 2023-12-23 11:38:06,051 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:38:07,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-23 11:38:08,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1115600.0, ans=0.0 2023-12-23 11:38:17,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115666.6666666667, ans=0.1 2023-12-23 11:38:26,146 INFO [train.py:886] (3/4) Epoch 36, batch 550, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4639833.08 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:38:36,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:38:36,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1115800.0, ans=0.0 2023-12-23 11:38:39,184 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.469e+01 3.649e+01 3.829e+01 4.187e+01, threshold=7.298e+01, percent-clipped=0.0 2023-12-23 11:38:47,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1115866.6666666667, ans=0.125 2023-12-23 11:39:05,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1115933.3333333333, ans=0.125 2023-12-23 11:39:06,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1116000.0, ans=0.125 2023-12-23 11:39:15,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1116000.0, ans=0.125 2023-12-23 11:39:17,352 INFO [train.py:886] (3/4) Epoch 36, batch 600, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 4707592.44 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:39:26,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1116133.3333333333, ans=0.07 2023-12-23 11:39:41,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1116200.0, ans=0.1 2023-12-23 11:39:46,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1116200.0, ans=0.125 2023-12-23 11:39:53,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-23 11:40:00,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116333.3333333333, ans=0.1 2023-12-23 11:40:01,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-12-23 11:40:08,959 INFO [train.py:886] (3/4) Epoch 36, batch 650, loss[loss=0.01256, audio_tagging_loss=0.01256, over 24750.00 frames. ], tot_loss[loss=0.01234, audio_tagging_loss=0.01234, over 4758429.17 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:40:17,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=8.0 2023-12-23 11:40:21,168 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.505e+01 3.653e+01 3.781e+01 4.331e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 11:40:37,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116533.3333333333, ans=0.1 2023-12-23 11:40:41,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116600.0, ans=0.1 2023-12-23 11:40:54,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1116666.6666666667, ans=0.125 2023-12-23 11:41:00,082 INFO [train.py:886] (3/4) Epoch 36, batch 700, loss[loss=0.01365, audio_tagging_loss=0.01365, over 24750.00 frames. ], tot_loss[loss=0.01225, audio_tagging_loss=0.01225, over 4792931.40 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:41:01,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1116733.3333333333, ans=0.125 2023-12-23 11:41:03,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116733.3333333333, ans=0.1 2023-12-23 11:41:06,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-12-23 11:41:13,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1116800.0, ans=0.0 2023-12-23 11:41:29,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-12-23 11:41:47,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1117000.0, ans=0.025 2023-12-23 11:41:52,369 INFO [train.py:886] (3/4) Epoch 36, batch 750, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4828498.73 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:41:56,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1117066.6666666667, ans=0.125 2023-12-23 11:42:06,171 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.142e+01 3.455e+01 3.620e+01 3.726e+01 4.614e+01, threshold=7.241e+01, percent-clipped=0.0 2023-12-23 11:42:14,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1117200.0, ans=0.0 2023-12-23 11:42:20,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1117200.0, ans=0.125 2023-12-23 11:42:20,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1117200.0, ans=0.0 2023-12-23 11:42:22,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1117200.0, ans=0.0 2023-12-23 11:42:24,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1117266.6666666667, ans=0.1 2023-12-23 11:42:30,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2023-12-23 11:42:31,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1117266.6666666667, ans=0.125 2023-12-23 11:42:39,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-12-23 11:42:45,302 INFO [train.py:886] (3/4) Epoch 36, batch 800, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4854197.78 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:42:52,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-12-23 11:42:57,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1117466.6666666667, ans=0.025 2023-12-23 11:43:03,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1117466.6666666667, ans=0.1 2023-12-23 11:43:05,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2023-12-23 11:43:14,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1117533.3333333333, ans=0.0 2023-12-23 11:43:22,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1117600.0, ans=0.125 2023-12-23 11:43:27,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1117666.6666666667, ans=0.125 2023-12-23 11:43:27,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.17 vs. limit=10.0 2023-12-23 11:43:29,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1117666.6666666667, ans=0.125 2023-12-23 11:43:36,793 INFO [train.py:886] (3/4) Epoch 36, batch 850, loss[loss=0.01049, audio_tagging_loss=0.01049, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4874323.31 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:43:42,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-12-23 11:43:43,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1117733.3333333333, ans=0.2 2023-12-23 11:43:44,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1117733.3333333333, ans=0.2 2023-12-23 11:43:50,409 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.033e+01 3.502e+01 3.619e+01 3.758e+01 4.758e+01, threshold=7.237e+01, percent-clipped=0.0 2023-12-23 11:43:52,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1117800.0, ans=0.0 2023-12-23 11:43:56,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1117800.0, ans=0.07 2023-12-23 11:44:12,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1117933.3333333333, ans=0.125 2023-12-23 11:44:27,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2023-12-23 11:44:29,654 INFO [train.py:886] (3/4) Epoch 36, batch 900, loss[loss=0.009904, audio_tagging_loss=0.009904, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4896095.38 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:45:01,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1118266.6666666667, ans=0.2 2023-12-23 11:45:08,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1118266.6666666667, ans=0.125 2023-12-23 11:45:21,014 INFO [train.py:886] (3/4) Epoch 36, batch 950, loss[loss=0.01393, audio_tagging_loss=0.01393, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4903519.83 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:45:25,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1118400.0, ans=0.125 2023-12-23 11:45:26,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-12-23 11:45:27,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-12-23 11:45:30,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1118400.0, ans=0.0 2023-12-23 11:45:34,593 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.528e+01 3.631e+01 3.836e+01 4.993e+01, threshold=7.263e+01, percent-clipped=0.0 2023-12-23 11:45:51,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-12-23 11:45:55,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1118600.0, ans=0.0 2023-12-23 11:45:58,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-12-23 11:45:59,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.40 vs. limit=15.0 2023-12-23 11:46:12,682 INFO [train.py:886] (3/4) Epoch 36, batch 1000, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4908454.06 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:46:15,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1118733.3333333333, ans=0.0 2023-12-23 11:46:21,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1118733.3333333333, ans=0.125 2023-12-23 11:46:29,408 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:46:31,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2023-12-23 11:46:42,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2023-12-23 11:46:51,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2023-12-23 11:47:04,980 INFO [train.py:886] (3/4) Epoch 36, batch 1050, loss[loss=0.01357, audio_tagging_loss=0.01357, over 24750.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4914759.44 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:47:07,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1119066.6666666667, ans=0.125 2023-12-23 11:47:10,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1119066.6666666667, ans=0.1 2023-12-23 11:47:12,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1119066.6666666667, ans=0.1 2023-12-23 11:47:18,039 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.121e+01 3.507e+01 3.657e+01 3.818e+01 4.217e+01, threshold=7.313e+01, percent-clipped=0.0 2023-12-23 11:47:45,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1119333.3333333333, ans=0.125 2023-12-23 11:47:46,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1119333.3333333333, ans=0.125 2023-12-23 11:47:49,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-12-23 11:47:51,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-12-23 11:47:56,219 INFO [train.py:886] (3/4) Epoch 36, batch 1100, loss[loss=0.01316, audio_tagging_loss=0.01316, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4924589.39 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:04,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1119400.0, ans=0.2 2023-12-23 11:48:11,937 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:48:14,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2023-12-23 11:48:16,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1119466.6666666667, ans=0.025 2023-12-23 11:48:18,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=15.0 2023-12-23 11:48:27,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1119600.0, ans=0.025 2023-12-23 11:48:38,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1119666.6666666667, ans=0.125 2023-12-23 11:48:39,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2023-12-23 11:48:44,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1119666.6666666667, ans=0.125 2023-12-23 11:48:48,632 INFO [train.py:886] (3/4) Epoch 36, batch 1150, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4931453.15 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:48:56,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2023-12-23 11:49:00,922 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.046e+01 3.437e+01 3.573e+01 3.725e+01 4.671e+01, threshold=7.145e+01, percent-clipped=0.0 2023-12-23 11:49:12,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1119866.6666666667, ans=0.1 2023-12-23 11:49:17,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1119866.6666666667, ans=0.125 2023-12-23 11:49:20,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1119933.3333333333, ans=0.125 2023-12-23 11:49:27,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1119933.3333333333, ans=0.125 2023-12-23 11:49:36,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1120000.0, ans=0.125 2023-12-23 11:49:40,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1120066.6666666667, ans=0.125 2023-12-23 11:49:41,568 INFO [train.py:886] (3/4) Epoch 36, batch 1200, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4940465.34 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:50:00,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120133.3333333333, ans=0.1 2023-12-23 11:50:01,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2023-12-23 11:50:08,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1120200.0, ans=0.125 2023-12-23 11:50:14,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1120266.6666666667, ans=0.125 2023-12-23 11:50:14,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-12-23 11:50:22,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120333.3333333333, ans=0.1 2023-12-23 11:50:32,356 INFO [train.py:886] (3/4) Epoch 36, batch 1250, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01207, audio_tagging_loss=0.01207, over 4939997.59 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:50:37,301 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.795e-02 2023-12-23 11:50:37,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-12-23 11:50:45,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1120466.6666666667, ans=0.125 2023-12-23 11:50:45,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1120466.6666666667, ans=0.1 2023-12-23 11:50:45,960 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.454e+01 3.601e+01 3.737e+01 4.840e+01, threshold=7.203e+01, percent-clipped=0.0 2023-12-23 11:51:20,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1120666.6666666667, ans=0.0 2023-12-23 11:51:24,561 INFO [train.py:886] (3/4) Epoch 36, batch 1300, loss[loss=0.0133, audio_tagging_loss=0.0133, over 24750.00 frames. ], tot_loss[loss=0.01214, audio_tagging_loss=0.01214, over 4934211.52 frames. ], batch size: 99, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:51:27,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1120733.3333333333, ans=0.2 2023-12-23 11:51:29,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-12-23 11:51:31,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.07 vs. limit=12.0 2023-12-23 11:51:35,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1120800.0, ans=0.125 2023-12-23 11:51:49,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1120866.6666666667, ans=0.0 2023-12-23 11:52:16,832 INFO [train.py:886] (3/4) Epoch 36, batch 1350, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4936029.63 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:52:24,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1121066.6666666667, ans=0.125 2023-12-23 11:52:29,816 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+01 3.453e+01 3.612e+01 3.766e+01 4.357e+01, threshold=7.223e+01, percent-clipped=0.0 2023-12-23 11:52:31,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1121133.3333333333, ans=0.0 2023-12-23 11:52:37,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1121200.0, ans=0.0 2023-12-23 11:52:37,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2023-12-23 11:52:45,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-12-23 11:52:45,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1121266.6666666667, ans=0.0 2023-12-23 11:53:04,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1121333.3333333333, ans=0.125 2023-12-23 11:53:07,515 INFO [train.py:886] (3/4) Epoch 36, batch 1400, loss[loss=0.01203, audio_tagging_loss=0.01203, over 21262.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4932778.75 frames. ], batch size: 107, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:53:40,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1121600.0, ans=0.2 2023-12-23 11:53:45,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1121600.0, ans=0.0 2023-12-23 11:53:46,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1121600.0, ans=0.125 2023-12-23 11:53:56,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1121666.6666666667, ans=0.0 2023-12-23 11:53:59,951 INFO [train.py:886] (3/4) Epoch 36, batch 1450, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4942432.08 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 32.0 2023-12-23 11:54:00,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1121733.3333333333, ans=0.125 2023-12-23 11:54:12,996 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.437e+01 3.604e+01 3.782e+01 4.556e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 11:54:21,556 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:54:26,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1121866.6666666667, ans=0.125 2023-12-23 11:54:29,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1121866.6666666667, ans=0.0 2023-12-23 11:54:39,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1121933.3333333333, ans=0.125 2023-12-23 11:54:40,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1122000.0, ans=0.0 2023-12-23 11:54:42,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2023-12-23 11:54:50,913 INFO [train.py:886] (3/4) Epoch 36, batch 1500, loss[loss=0.01414, audio_tagging_loss=0.01414, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4946805.04 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:54:53,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1122066.6666666667, ans=0.5 2023-12-23 11:54:58,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122066.6666666667, ans=0.1 2023-12-23 11:55:12,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1122200.0, ans=0.125 2023-12-23 11:55:19,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1122200.0, ans=0.125 2023-12-23 11:55:28,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1122266.6666666667, ans=0.125 2023-12-23 11:55:33,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1122333.3333333333, ans=0.2 2023-12-23 11:55:42,819 INFO [train.py:886] (3/4) Epoch 36, batch 1550, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4941989.88 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:55:53,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2023-12-23 11:55:55,147 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.501e+01 3.689e+01 3.879e+01 4.418e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 11:55:57,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1122466.6666666667, ans=0.5 2023-12-23 11:56:02,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122533.3333333333, ans=0.1 2023-12-23 11:56:08,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-12-23 11:56:24,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1122666.6666666667, ans=0.125 2023-12-23 11:56:33,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1122666.6666666667, ans=0.125 2023-12-23 11:56:34,944 INFO [train.py:886] (3/4) Epoch 36, batch 1600, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4939695.15 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:56:52,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1122800.0, ans=0.2 2023-12-23 11:56:54,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1122866.6666666667, ans=0.125 2023-12-23 11:56:58,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1122866.6666666667, ans=15.0 2023-12-23 11:56:59,443 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:57:19,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1123000.0, ans=0.125 2023-12-23 11:57:24,997 INFO [train.py:886] (3/4) Epoch 36, batch 1650, loss[loss=0.009282, audio_tagging_loss=0.009282, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4934172.20 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:57:38,146 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.059e+01 3.477e+01 3.648e+01 3.896e+01 4.999e+01, threshold=7.295e+01, percent-clipped=0.0 2023-12-23 11:57:49,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-12-23 11:58:02,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1123266.6666666667, ans=0.125 2023-12-23 11:58:16,226 INFO [train.py:886] (3/4) Epoch 36, batch 1700, loss[loss=0.009681, audio_tagging_loss=0.009681, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4940534.23 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:58:22,040 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:58:38,589 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:58:51,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1123600.0, ans=0.0 2023-12-23 11:58:53,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1123600.0, ans=0.0 2023-12-23 11:59:02,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1123666.6666666667, ans=0.2 2023-12-23 11:59:03,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1123666.6666666667, ans=0.0 2023-12-23 11:59:05,912 INFO [train.py:886] (3/4) Epoch 36, batch 1750, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4947222.22 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 11:59:12,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1123733.3333333333, ans=0.035 2023-12-23 11:59:20,317 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.178e+01 3.484e+01 3.617e+01 3.775e+01 4.286e+01, threshold=7.233e+01, percent-clipped=0.0 2023-12-23 11:59:22,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1123800.0, ans=0.5 2023-12-23 11:59:23,345 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:59:27,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1123866.6666666667, ans=0.04949747468305833 2023-12-23 11:59:33,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-12-23 11:59:33,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1123866.6666666667, ans=0.125 2023-12-23 11:59:33,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1123866.6666666667, ans=0.125 2023-12-23 11:59:45,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1123933.3333333333, ans=0.0 2023-12-23 11:59:48,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1124000.0, ans=0.2 2023-12-23 11:59:49,579 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 11:59:49,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1124000.0, ans=0.125 2023-12-23 11:59:57,793 INFO [train.py:886] (3/4) Epoch 36, batch 1800, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4953979.20 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:00:14,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1124133.3333333333, ans=0.125 2023-12-23 12:00:29,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1124266.6666666667, ans=0.1 2023-12-23 12:00:31,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1124266.6666666667, ans=0.125 2023-12-23 12:00:37,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1124333.3333333333, ans=0.1 2023-12-23 12:00:48,658 INFO [train.py:886] (3/4) Epoch 36, batch 1850, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4953854.94 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:01:02,415 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.973e+01 3.520e+01 3.688e+01 3.897e+01 4.478e+01, threshold=7.376e+01, percent-clipped=0.0 2023-12-23 12:01:15,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1124533.3333333333, ans=0.04949747468305833 2023-12-23 12:01:21,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1124600.0, ans=0.125 2023-12-23 12:01:22,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1124600.0, ans=0.2 2023-12-23 12:01:35,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1124666.6666666667, ans=0.125 2023-12-23 12:01:39,400 INFO [train.py:886] (3/4) Epoch 36, batch 1900, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4946129.55 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:01:39,586 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:02:19,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1124933.3333333333, ans=0.125 2023-12-23 12:02:23,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=12.0 2023-12-23 12:02:32,640 INFO [train.py:886] (3/4) Epoch 36, batch 1950, loss[loss=0.01541, audio_tagging_loss=0.01541, over 24925.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4944062.04 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 32.0 2023-12-23 12:02:38,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1125066.6666666667, ans=0.2 2023-12-23 12:02:45,205 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.533e+01 3.647e+01 3.862e+01 4.201e+01, threshold=7.294e+01, percent-clipped=0.0 2023-12-23 12:03:11,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1125266.6666666667, ans=0.0 2023-12-23 12:03:17,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1125333.3333333333, ans=0.1 2023-12-23 12:03:24,585 INFO [train.py:886] (3/4) Epoch 36, batch 2000, loss[loss=0.01222, audio_tagging_loss=0.01222, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4939496.57 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:03:38,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1125466.6666666667, ans=0.125 2023-12-23 12:03:40,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1125466.6666666667, ans=0.04949747468305833 2023-12-23 12:03:50,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1125533.3333333333, ans=0.2 2023-12-23 12:04:10,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-12-23 12:04:12,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1125666.6666666667, ans=0.2 2023-12-23 12:04:14,850 INFO [train.py:886] (3/4) Epoch 36, batch 2050, loss[loss=0.01304, audio_tagging_loss=0.01304, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4946948.12 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:04:22,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-12-23 12:04:28,472 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.473e+01 3.640e+01 3.754e+01 4.279e+01, threshold=7.281e+01, percent-clipped=0.0 2023-12-23 12:04:30,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1125800.0, ans=0.0 2023-12-23 12:04:35,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1125866.6666666667, ans=0.125 2023-12-23 12:04:45,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1125933.3333333333, ans=0.125 2023-12-23 12:04:54,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1126000.0, ans=0.125 2023-12-23 12:04:54,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-12-23 12:04:58,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126000.0, ans=0.0 2023-12-23 12:05:06,210 INFO [train.py:886] (3/4) Epoch 36, batch 2100, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4947936.22 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:05:06,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1126066.6666666667, ans=0.04949747468305833 2023-12-23 12:05:19,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-12-23 12:05:24,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2023-12-23 12:05:35,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1126200.0, ans=0.125 2023-12-23 12:05:36,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-12-23 12:05:38,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-12-23 12:05:43,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1126266.6666666667, ans=0.125 2023-12-23 12:05:58,001 INFO [train.py:886] (3/4) Epoch 36, batch 2150, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4954504.22 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:05:59,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126400.0, ans=0.0 2023-12-23 12:05:59,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=8.0 2023-12-23 12:06:08,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1126466.6666666667, ans=0.0 2023-12-23 12:06:11,654 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.510e+01 3.673e+01 3.806e+01 4.496e+01, threshold=7.347e+01, percent-clipped=0.0 2023-12-23 12:06:19,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1126533.3333333333, ans=0.2 2023-12-23 12:06:21,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1126533.3333333333, ans=0.0 2023-12-23 12:06:21,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126533.3333333333, ans=0.0 2023-12-23 12:06:21,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1126533.3333333333, ans=0.2 2023-12-23 12:06:24,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1126533.3333333333, ans=0.125 2023-12-23 12:06:34,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1126600.0, ans=0.125 2023-12-23 12:06:46,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1126666.6666666667, ans=0.125 2023-12-23 12:06:46,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1126666.6666666667, ans=0.125 2023-12-23 12:06:50,292 INFO [train.py:886] (3/4) Epoch 36, batch 2200, loss[loss=0.01451, audio_tagging_loss=0.01451, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4948146.17 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:06:54,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2023-12-23 12:07:11,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2023-12-23 12:07:41,582 INFO [train.py:886] (3/4) Epoch 36, batch 2250, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4945539.32 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:07:52,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-12-23 12:07:54,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.130e+01 3.509e+01 3.634e+01 3.761e+01 4.553e+01, threshold=7.267e+01, percent-clipped=0.0 2023-12-23 12:07:55,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-12-23 12:07:57,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-12-23 12:08:24,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1127333.3333333333, ans=0.0 2023-12-23 12:08:33,255 INFO [train.py:886] (3/4) Epoch 36, batch 2300, loss[loss=0.01163, audio_tagging_loss=0.01163, over 22428.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4941845.23 frames. ], batch size: 107, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:08:33,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1127400.0, ans=0.2 2023-12-23 12:08:36,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1127400.0, ans=0.09899494936611666 2023-12-23 12:08:43,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2023-12-23 12:08:47,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-12-23 12:08:53,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1127533.3333333333, ans=0.1 2023-12-23 12:09:22,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1127666.6666666667, ans=0.0 2023-12-23 12:09:25,072 INFO [train.py:886] (3/4) Epoch 36, batch 2350, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4941561.76 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:09:33,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1127733.3333333333, ans=0.0 2023-12-23 12:09:39,020 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.388e+01 3.533e+01 3.734e+01 4.498e+01, threshold=7.065e+01, percent-clipped=0.0 2023-12-23 12:09:48,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=12.0 2023-12-23 12:09:50,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1127866.6666666667, ans=0.2 2023-12-23 12:10:17,014 INFO [train.py:886] (3/4) Epoch 36, batch 2400, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4945575.19 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:10:22,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1128066.6666666667, ans=0.125 2023-12-23 12:10:25,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1128066.6666666667, ans=0.2 2023-12-23 12:10:53,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-12-23 12:11:02,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1128333.3333333333, ans=0.0 2023-12-23 12:11:09,465 INFO [train.py:886] (3/4) Epoch 36, batch 2450, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4950716.81 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:11:10,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1128400.0, ans=0.125 2023-12-23 12:11:13,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1128400.0, ans=0.125 2023-12-23 12:11:22,551 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.115e+01 3.491e+01 3.672e+01 3.810e+01 4.386e+01, threshold=7.343e+01, percent-clipped=0.0 2023-12-23 12:11:25,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1128466.6666666667, ans=0.0 2023-12-23 12:11:32,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1128533.3333333333, ans=0.0 2023-12-23 12:12:02,006 INFO [train.py:886] (3/4) Epoch 36, batch 2500, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01199, audio_tagging_loss=0.01199, over 4950723.10 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:12:19,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1128800.0, ans=0.1 2023-12-23 12:12:32,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1128933.3333333333, ans=0.0 2023-12-23 12:12:33,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2023-12-23 12:12:52,179 INFO [train.py:886] (3/4) Epoch 36, batch 2550, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4944841.24 frames. ], batch size: 99, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:13:00,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1129066.6666666667, ans=0.2 2023-12-23 12:13:03,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1129133.3333333333, ans=10.0 2023-12-23 12:13:04,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1129133.3333333333, ans=0.125 2023-12-23 12:13:06,437 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.554e+01 3.683e+01 3.809e+01 4.296e+01, threshold=7.365e+01, percent-clipped=0.0 2023-12-23 12:13:17,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1129200.0, ans=0.0 2023-12-23 12:13:35,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1129333.3333333333, ans=0.125 2023-12-23 12:13:44,966 INFO [train.py:886] (3/4) Epoch 36, batch 2600, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4949218.60 frames. ], batch size: 100, lr: 2.99e-03, grad_scale: 64.0 2023-12-23 12:13:46,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1129400.0, ans=0.2 2023-12-23 12:13:47,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1129400.0, ans=0.0 2023-12-23 12:13:53,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1129466.6666666667, ans=0.1 2023-12-23 12:14:00,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1129466.6666666667, ans=0.2 2023-12-23 12:14:05,429 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:14:06,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1129533.3333333333, ans=0.125 2023-12-23 12:14:08,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-23 12:14:13,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-12-23 12:14:21,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129600.0, ans=0.1 2023-12-23 12:14:23,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1129600.0, ans=0.125 2023-12-23 12:14:35,669 INFO [train.py:886] (3/4) Epoch 36, batch 2650, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4951234.31 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:14:38,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1129733.3333333333, ans=0.2 2023-12-23 12:14:48,681 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.201e+01 3.506e+01 3.658e+01 3.796e+01 4.295e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 12:15:21,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1130000.0, ans=0.07 2023-12-23 12:15:26,058 INFO [train.py:886] (3/4) Epoch 36, batch 2700, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4953991.41 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:15:30,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1130066.6666666667, ans=0.1 2023-12-23 12:15:36,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1130133.3333333333, ans=0.1 2023-12-23 12:15:47,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1130200.0, ans=0.0 2023-12-23 12:15:50,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1130200.0, ans=0.0 2023-12-23 12:16:16,475 INFO [train.py:886] (3/4) Epoch 36, batch 2750, loss[loss=0.01385, audio_tagging_loss=0.01385, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4959627.31 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:16:18,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1130400.0, ans=0.125 2023-12-23 12:16:28,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1130466.6666666667, ans=15.0 2023-12-23 12:16:29,405 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.206e+01 3.461e+01 3.594e+01 3.787e+01 4.376e+01, threshold=7.188e+01, percent-clipped=0.0 2023-12-23 12:16:30,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1130466.6666666667, ans=0.125 2023-12-23 12:16:37,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1130533.3333333333, ans=0.2 2023-12-23 12:16:41,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1130533.3333333333, ans=0.125 2023-12-23 12:17:06,875 INFO [train.py:886] (3/4) Epoch 36, batch 2800, loss[loss=0.008097, audio_tagging_loss=0.008097, over 23957.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4960199.60 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:17:16,825 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:17:18,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1130800.0, ans=0.125 2023-12-23 12:17:47,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-12-23 12:17:57,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1131000.0, ans=0.125 2023-12-23 12:17:59,702 INFO [train.py:886] (3/4) Epoch 36, batch 2850, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24055.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4952084.37 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:18:11,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.495e+01 3.630e+01 3.797e+01 4.361e+01, threshold=7.259e+01, percent-clipped=0.0 2023-12-23 12:18:22,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131200.0, ans=0.1 2023-12-23 12:18:23,024 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:18:40,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1131333.3333333333, ans=0.1 2023-12-23 12:18:52,300 INFO [train.py:886] (3/4) Epoch 36, batch 2900, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4953191.57 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:18:56,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1131400.0, ans=0.125 2023-12-23 12:19:10,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1131466.6666666667, ans=0.125 2023-12-23 12:19:23,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2023-12-23 12:19:27,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1131600.0, ans=0.125 2023-12-23 12:19:43,636 INFO [train.py:886] (3/4) Epoch 36, batch 2950, loss[loss=0.01257, audio_tagging_loss=0.01257, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4952597.10 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:19:57,306 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.076e+01 3.440e+01 3.602e+01 3.791e+01 4.339e+01, threshold=7.205e+01, percent-clipped=0.0 2023-12-23 12:20:02,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1131800.0, ans=0.0 2023-12-23 12:20:07,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1131866.6666666667, ans=0.04949747468305833 2023-12-23 12:20:09,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1131866.6666666667, ans=0.125 2023-12-23 12:20:26,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1132000.0, ans=0.05 2023-12-23 12:20:29,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1132000.0, ans=0.125 2023-12-23 12:20:36,167 INFO [train.py:886] (3/4) Epoch 36, batch 3000, loss[loss=0.01295, audio_tagging_loss=0.01295, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4958197.71 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:20:36,168 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 12:20:57,369 INFO [train.py:917] (3/4) Epoch 36, validation: loss=0.0342, audio_tagging_loss=0.0342, over 3737520.00 frames. 2023-12-23 12:20:57,370 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 12:20:58,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1132066.6666666667, ans=0.125 2023-12-23 12:21:30,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-23 12:21:37,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132266.6666666667, ans=0.1 2023-12-23 12:21:48,951 INFO [train.py:886] (3/4) Epoch 36, batch 3050, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4962329.78 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:21:52,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-12-23 12:22:02,619 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.482e+01 3.618e+01 3.774e+01 4.339e+01, threshold=7.235e+01, percent-clipped=0.0 2023-12-23 12:22:16,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132533.3333333333, ans=0.1 2023-12-23 12:22:16,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-12-23 12:22:31,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1132666.6666666667, ans=0.125 2023-12-23 12:22:41,232 INFO [train.py:886] (3/4) Epoch 36, batch 3100, loss[loss=0.01354, audio_tagging_loss=0.01354, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4959835.60 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:22:41,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1132733.3333333333, ans=0.1 2023-12-23 12:22:43,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1132733.3333333333, ans=0.025 2023-12-23 12:22:50,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1132800.0, ans=0.1 2023-12-23 12:22:52,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132800.0, ans=0.1 2023-12-23 12:23:09,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1132866.6666666667, ans=0.0 2023-12-23 12:23:15,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1132933.3333333333, ans=0.05 2023-12-23 12:23:20,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1132933.3333333333, ans=0.125 2023-12-23 12:23:32,810 INFO [train.py:886] (3/4) Epoch 36, batch 3150, loss[loss=0.01571, audio_tagging_loss=0.01571, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4952821.67 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:23:46,310 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.140e+01 3.564e+01 3.708e+01 3.854e+01 4.503e+01, threshold=7.417e+01, percent-clipped=0.0 2023-12-23 12:23:52,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1133200.0, ans=0.125 2023-12-23 12:24:04,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1133266.6666666667, ans=0.125 2023-12-23 12:24:16,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1133333.3333333333, ans=0.125 2023-12-23 12:24:24,400 INFO [train.py:886] (3/4) Epoch 36, batch 3200, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4948098.95 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:24:56,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=1133600.0, ans=15.0 2023-12-23 12:25:16,136 INFO [train.py:886] (3/4) Epoch 36, batch 3250, loss[loss=0.01301, audio_tagging_loss=0.01301, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4951552.17 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:25:29,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.485e+01 3.604e+01 3.735e+01 4.433e+01, threshold=7.209e+01, percent-clipped=0.0 2023-12-23 12:25:30,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1133800.0, ans=0.125 2023-12-23 12:25:46,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1133933.3333333333, ans=0.0 2023-12-23 12:26:01,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1134000.0, ans=0.125 2023-12-23 12:26:03,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1134000.0, ans=0.125 2023-12-23 12:26:05,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1134000.0, ans=0.125 2023-12-23 12:26:07,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1134066.6666666667, ans=0.125 2023-12-23 12:26:07,785 INFO [train.py:886] (3/4) Epoch 36, batch 3300, loss[loss=0.0137, audio_tagging_loss=0.0137, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4947814.34 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:26:10,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-23 12:26:10,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1134066.6666666667, ans=0.125 2023-12-23 12:26:15,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1134066.6666666667, ans=15.0 2023-12-23 12:26:16,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-12-23 12:26:28,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1134200.0, ans=0.125 2023-12-23 12:26:47,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1134266.6666666667, ans=0.125 2023-12-23 12:26:52,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1134333.3333333333, ans=0.09899494936611666 2023-12-23 12:26:59,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1134400.0, ans=0.125 2023-12-23 12:27:00,479 INFO [train.py:886] (3/4) Epoch 36, batch 3350, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4954882.23 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:27:03,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1134400.0, ans=0.0 2023-12-23 12:27:07,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1134400.0, ans=0.0 2023-12-23 12:27:13,419 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.068e+01 3.479e+01 3.642e+01 3.791e+01 4.255e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 12:27:18,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1134466.6666666667, ans=15.0 2023-12-23 12:27:22,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1134533.3333333333, ans=0.125 2023-12-23 12:27:38,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1134600.0, ans=0.125 2023-12-23 12:27:43,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1134666.6666666667, ans=0.2 2023-12-23 12:27:53,063 INFO [train.py:886] (3/4) Epoch 36, batch 3400, loss[loss=0.01252, audio_tagging_loss=0.01252, over 24750.00 frames. ], tot_loss[loss=0.0121, audio_tagging_loss=0.0121, over 4961174.31 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:28:03,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1134800.0, ans=0.2 2023-12-23 12:28:12,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1134866.6666666667, ans=0.125 2023-12-23 12:28:26,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1134933.3333333333, ans=0.1 2023-12-23 12:28:45,264 INFO [train.py:886] (3/4) Epoch 36, batch 3450, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4957932.85 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:28:58,962 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.129e+01 3.599e+01 3.745e+01 3.958e+01 4.783e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 12:29:05,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2023-12-23 12:29:10,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135200.0, ans=0.1 2023-12-23 12:29:12,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1135200.0, ans=0.125 2023-12-23 12:29:17,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1135266.6666666667, ans=0.05 2023-12-23 12:29:17,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1135266.6666666667, ans=0.1 2023-12-23 12:29:19,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1135266.6666666667, ans=0.125 2023-12-23 12:29:31,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135333.3333333333, ans=0.1 2023-12-23 12:29:32,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1135333.3333333333, ans=0.0 2023-12-23 12:29:37,382 INFO [train.py:886] (3/4) Epoch 36, batch 3500, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4946589.63 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:29:39,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2023-12-23 12:29:44,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1135400.0, ans=0.125 2023-12-23 12:29:45,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1135400.0, ans=0.2 2023-12-23 12:30:00,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1135533.3333333333, ans=0.125 2023-12-23 12:30:29,010 INFO [train.py:886] (3/4) Epoch 36, batch 3550, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4941334.66 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:30:29,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-12-23 12:30:42,831 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.480e+01 3.652e+01 3.812e+01 4.664e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 12:30:46,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1135800.0, ans=0.0 2023-12-23 12:31:08,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1135933.3333333333, ans=0.2 2023-12-23 12:31:19,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1136000.0, ans=0.125 2023-12-23 12:31:21,214 INFO [train.py:886] (3/4) Epoch 36, batch 3600, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4945780.70 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:31:24,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1136066.6666666667, ans=10.0 2023-12-23 12:31:27,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1136066.6666666667, ans=0.125 2023-12-23 12:31:34,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1136133.3333333333, ans=0.2 2023-12-23 12:31:35,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1136133.3333333333, ans=0.125 2023-12-23 12:31:38,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1136133.3333333333, ans=0.125 2023-12-23 12:31:49,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1136200.0, ans=0.125 2023-12-23 12:31:49,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2023-12-23 12:31:50,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1136200.0, ans=0.2 2023-12-23 12:31:56,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1136266.6666666667, ans=0.04949747468305833 2023-12-23 12:32:13,780 INFO [train.py:886] (3/4) Epoch 36, batch 3650, loss[loss=0.009809, audio_tagging_loss=0.009809, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4943692.10 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:32:26,859 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.117e+01 3.484e+01 3.633e+01 3.763e+01 4.234e+01, threshold=7.265e+01, percent-clipped=0.0 2023-12-23 12:32:27,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1136466.6666666667, ans=10.0 2023-12-23 12:32:46,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1136600.0, ans=0.125 2023-12-23 12:32:57,340 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.793e-02 2023-12-23 12:32:58,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1136666.6666666667, ans=0.2 2023-12-23 12:33:03,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-12-23 12:33:04,710 INFO [train.py:886] (3/4) Epoch 36, batch 3700, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4945714.55 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:33:10,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1136733.3333333333, ans=0.1 2023-12-23 12:33:36,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-12-23 12:33:36,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-12-23 12:33:40,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1136933.3333333333, ans=0.1 2023-12-23 12:33:42,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1136933.3333333333, ans=0.125 2023-12-23 12:33:54,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1137000.0, ans=0.2 2023-12-23 12:33:55,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1137000.0, ans=0.125 2023-12-23 12:33:57,648 INFO [train.py:886] (3/4) Epoch 36, batch 3750, loss[loss=0.01372, audio_tagging_loss=0.01372, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4946808.05 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 64.0 2023-12-23 12:33:58,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137066.6666666667, ans=0.125 2023-12-23 12:33:59,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1137066.6666666667, ans=10.0 2023-12-23 12:34:00,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1137066.6666666667, ans=0.125 2023-12-23 12:34:09,919 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.235e+01 3.553e+01 3.740e+01 3.871e+01 4.273e+01, threshold=7.479e+01, percent-clipped=0.0 2023-12-23 12:34:10,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1137133.3333333333, ans=0.0 2023-12-23 12:34:17,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1137200.0, ans=0.0 2023-12-23 12:34:18,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1137200.0, ans=0.1 2023-12-23 12:34:20,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1137200.0, ans=0.0 2023-12-23 12:34:29,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1137266.6666666667, ans=0.0 2023-12-23 12:34:29,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1137266.6666666667, ans=0.2 2023-12-23 12:34:34,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1137266.6666666667, ans=0.2 2023-12-23 12:34:39,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-12-23 12:34:49,188 INFO [train.py:886] (3/4) Epoch 36, batch 3800, loss[loss=0.009543, audio_tagging_loss=0.009543, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4943486.30 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:34:58,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1137400.0, ans=0.05 2023-12-23 12:35:00,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1137466.6666666667, ans=0.125 2023-12-23 12:35:07,842 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.599e-01 2023-12-23 12:35:35,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-12-23 12:35:40,314 INFO [train.py:886] (3/4) Epoch 36, batch 3850, loss[loss=0.00993, audio_tagging_loss=0.00993, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4940986.70 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:35:54,882 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.476e+01 3.694e+01 3.926e+01 4.509e+01, threshold=7.388e+01, percent-clipped=0.0 2023-12-23 12:35:59,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137800.0, ans=0.125 2023-12-23 12:36:12,150 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:36:18,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137933.3333333333, ans=0.1 2023-12-23 12:36:19,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1137933.3333333333, ans=0.0 2023-12-23 12:36:25,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1138000.0, ans=10.0 2023-12-23 12:36:27,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-12-23 12:36:33,082 INFO [train.py:886] (3/4) Epoch 36, batch 3900, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4942957.46 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:36:48,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1138133.3333333333, ans=0.025 2023-12-23 12:37:14,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1138333.3333333333, ans=0.125 2023-12-23 12:37:15,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1138333.3333333333, ans=0.0 2023-12-23 12:37:16,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1138333.3333333333, ans=0.2 2023-12-23 12:37:17,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1138333.3333333333, ans=0.125 2023-12-23 12:37:17,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1138333.3333333333, ans=0.07 2023-12-23 12:37:21,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1138333.3333333333, ans=0.0 2023-12-23 12:37:23,456 INFO [train.py:886] (3/4) Epoch 36, batch 3950, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4949117.07 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:37:30,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1138400.0, ans=0.125 2023-12-23 12:37:37,658 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.120e+01 3.438e+01 3.584e+01 3.728e+01 4.194e+01, threshold=7.169e+01, percent-clipped=0.0 2023-12-23 12:37:45,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1138533.3333333333, ans=0.05 2023-12-23 12:37:50,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1138533.3333333333, ans=0.2 2023-12-23 12:37:51,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1138533.3333333333, ans=0.2 2023-12-23 12:38:11,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1138666.6666666667, ans=0.2 2023-12-23 12:38:16,573 INFO [train.py:886] (3/4) Epoch 36, batch 4000, loss[loss=0.01573, audio_tagging_loss=0.01573, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4952201.44 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 128.0 2023-12-23 12:38:26,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-23 12:38:33,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1138800.0, ans=0.2 2023-12-23 12:38:35,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1138866.6666666667, ans=0.125 2023-12-23 12:38:39,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1138866.6666666667, ans=0.125 2023-12-23 12:38:48,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1138933.3333333333, ans=0.09899494936611666 2023-12-23 12:38:52,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.28 vs. limit=15.0 2023-12-23 12:38:58,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1139000.0, ans=0.125 2023-12-23 12:39:07,333 INFO [train.py:886] (3/4) Epoch 36, batch 4050, loss[loss=0.01443, audio_tagging_loss=0.01443, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4957711.77 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:39:21,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2023-12-23 12:39:22,656 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.561e+01 3.698e+01 3.887e+01 4.580e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 12:39:27,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1139200.0, ans=0.125 2023-12-23 12:39:31,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1139200.0, ans=0.2 2023-12-23 12:39:37,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=12.0 2023-12-23 12:39:43,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139266.6666666667, ans=0.1 2023-12-23 12:39:51,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-12-23 12:39:57,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1139333.3333333333, ans=0.0 2023-12-23 12:39:59,476 INFO [train.py:886] (3/4) Epoch 36, batch 4100, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01203, audio_tagging_loss=0.01203, over 4950087.46 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:40:00,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139400.0, ans=0.1 2023-12-23 12:40:10,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139466.6666666667, ans=0.1 2023-12-23 12:40:25,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-12-23 12:40:41,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139666.6666666667, ans=0.1 2023-12-23 12:40:52,659 INFO [train.py:886] (3/4) Epoch 36, batch 4150, loss[loss=0.00999, audio_tagging_loss=0.00999, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4945684.53 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:40:56,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-12-23 12:41:06,033 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.179e+01 3.544e+01 3.659e+01 3.852e+01 4.683e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 12:41:13,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1139866.6666666667, ans=0.125 2023-12-23 12:41:31,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1139933.3333333333, ans=0.0 2023-12-23 12:41:43,948 INFO [train.py:886] (3/4) Epoch 36, batch 4200, loss[loss=0.01201, audio_tagging_loss=0.01201, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4948354.39 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:41:50,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1140066.6666666667, ans=0.125 2023-12-23 12:41:52,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2023-12-23 12:41:52,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2023-12-23 12:41:55,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1140133.3333333333, ans=0.2 2023-12-23 12:42:05,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=22.5 2023-12-23 12:42:12,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1140200.0, ans=0.125 2023-12-23 12:42:17,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1140266.6666666667, ans=0.0 2023-12-23 12:42:23,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-23 12:42:26,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1140333.3333333333, ans=0.125 2023-12-23 12:42:28,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1140333.3333333333, ans=0.125 2023-12-23 12:42:33,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1140333.3333333333, ans=0.125 2023-12-23 12:42:36,323 INFO [train.py:886] (3/4) Epoch 36, batch 4250, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4947339.02 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:42:48,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1140466.6666666667, ans=0.125 2023-12-23 12:42:50,260 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.180e+01 3.490e+01 3.625e+01 3.785e+01 4.316e+01, threshold=7.251e+01, percent-clipped=0.0 2023-12-23 12:42:57,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1140533.3333333333, ans=0.0 2023-12-23 12:43:01,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1140533.3333333333, ans=0.125 2023-12-23 12:43:12,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1140600.0, ans=0.125 2023-12-23 12:43:27,467 INFO [train.py:886] (3/4) Epoch 36, batch 4300, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4951699.08 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:43:29,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-12-23 12:43:57,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-23 12:44:03,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-12-23 12:44:16,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1141000.0, ans=0.125 2023-12-23 12:44:18,283 INFO [train.py:886] (3/4) Epoch 36, batch 4350, loss[loss=0.01311, audio_tagging_loss=0.01311, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4955035.26 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:44:27,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1141133.3333333333, ans=0.0 2023-12-23 12:44:32,757 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.183e+01 3.485e+01 3.639e+01 3.859e+01 4.692e+01, threshold=7.279e+01, percent-clipped=0.0 2023-12-23 12:44:34,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-12-23 12:45:09,866 INFO [train.py:886] (3/4) Epoch 36, batch 4400, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4955901.52 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:45:16,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-23 12:45:19,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1141466.6666666667, ans=0.0 2023-12-23 12:45:22,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-12-23 12:45:25,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1141466.6666666667, ans=0.125 2023-12-23 12:45:28,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1141466.6666666667, ans=0.0 2023-12-23 12:45:38,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-12-23 12:45:46,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1141600.0, ans=10.0 2023-12-23 12:46:01,346 INFO [train.py:886] (3/4) Epoch 36, batch 4450, loss[loss=0.01307, audio_tagging_loss=0.01307, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4952518.82 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:06,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1141733.3333333333, ans=0.125 2023-12-23 12:46:06,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2023-12-23 12:46:15,970 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.197e+01 3.605e+01 3.750e+01 3.906e+01 4.617e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 12:46:16,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1141800.0, ans=0.125 2023-12-23 12:46:16,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1141800.0, ans=0.0 2023-12-23 12:46:24,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1141866.6666666667, ans=10.0 2023-12-23 12:46:26,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-12-23 12:46:53,747 INFO [train.py:886] (3/4) Epoch 36, batch 4500, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4954645.57 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:46:53,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1142066.6666666667, ans=0.0 2023-12-23 12:47:04,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1142133.3333333333, ans=0.125 2023-12-23 12:47:27,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1142266.6666666667, ans=0.0 2023-12-23 12:47:32,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1142266.6666666667, ans=0.0 2023-12-23 12:47:32,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-12-23 12:47:35,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2023-12-23 12:47:41,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1142333.3333333333, ans=0.0 2023-12-23 12:47:45,888 INFO [train.py:886] (3/4) Epoch 36, batch 4550, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4951129.45 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:48:00,331 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.540e+01 3.639e+01 3.809e+01 4.565e+01, threshold=7.278e+01, percent-clipped=0.0 2023-12-23 12:48:05,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1142533.3333333333, ans=0.125 2023-12-23 12:48:11,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2023-12-23 12:48:12,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 12:48:37,315 INFO [train.py:886] (3/4) Epoch 36, batch 4600, loss[loss=0.01325, audio_tagging_loss=0.01325, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950647.77 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:49:18,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1143000.0, ans=0.0 2023-12-23 12:49:21,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1143000.0, ans=0.2 2023-12-23 12:49:23,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-12-23 12:49:29,418 INFO [train.py:886] (3/4) Epoch 36, batch 4650, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4958668.34 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:49:32,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1143066.6666666667, ans=0.0 2023-12-23 12:49:32,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2023-12-23 12:49:43,307 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.146e+01 3.508e+01 3.620e+01 3.811e+01 5.127e+01, threshold=7.240e+01, percent-clipped=0.0 2023-12-23 12:49:44,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1143133.3333333333, ans=0.125 2023-12-23 12:49:53,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1143200.0, ans=0.125 2023-12-23 12:50:02,638 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:50:05,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2023-12-23 12:50:19,602 INFO [train.py:886] (3/4) Epoch 36, batch 4700, loss[loss=0.01014, audio_tagging_loss=0.01014, over 21772.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4948405.72 frames. ], batch size: 107, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:50:30,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1143466.6666666667, ans=0.07 2023-12-23 12:50:33,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2023-12-23 12:50:37,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.08 vs. limit=12.0 2023-12-23 12:50:57,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1143666.6666666667, ans=0.125 2023-12-23 12:50:58,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1143666.6666666667, ans=0.05 2023-12-23 12:51:07,001 INFO [train.py:886] (3/4) Epoch 36, batch 4750, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4945977.25 frames. ], batch size: 99, lr: 2.97e-03, grad_scale: 64.0 2023-12-23 12:51:19,667 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.564e+01 3.739e+01 3.867e+01 4.563e+01, threshold=7.477e+01, percent-clipped=0.0 2023-12-23 12:51:42,459 INFO [train.py:886] (3/4) Epoch 37, batch 0, loss[loss=0.0241, audio_tagging_loss=0.0241, over 24082.00 frames. ], tot_loss[loss=0.0241, audio_tagging_loss=0.0241, over 24082.00 frames. ], batch size: 100, lr: 2.93e-03, grad_scale: 32.0 2023-12-23 12:51:42,459 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 12:51:52,494 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7569, 5.9160, 5.3502, 5.6490], device='cuda:3') 2023-12-23 12:52:03,033 INFO [train.py:917] (3/4) Epoch 37, validation: loss=0.03436, audio_tagging_loss=0.03436, over 3737520.00 frames. 2023-12-23 12:52:03,034 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 12:52:04,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1143840.0, ans=0.2 2023-12-23 12:52:10,182 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 12:52:14,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1143906.6666666667, ans=0.125 2023-12-23 12:52:16,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-12-23 12:52:16,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1143906.6666666667, ans=0.125 2023-12-23 12:52:25,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2023-12-23 12:52:31,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1143973.3333333333, ans=0.0 2023-12-23 12:52:40,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1144040.0, ans=0.0 2023-12-23 12:52:46,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1144106.6666666667, ans=0.125 2023-12-23 12:52:49,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1144106.6666666667, ans=0.1 2023-12-23 12:52:53,487 INFO [train.py:886] (3/4) Epoch 37, batch 50, loss[loss=0.01507, audio_tagging_loss=0.01507, over 25000.00 frames. ], tot_loss[loss=0.01884, audio_tagging_loss=0.01884, over 1119315.24 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:03,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1144240.0, ans=0.0 2023-12-23 12:53:03,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1144240.0, ans=0.0 2023-12-23 12:53:22,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1144306.6666666667, ans=0.0 2023-12-23 12:53:26,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1144373.3333333333, ans=0.1 2023-12-23 12:53:42,400 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.184e+01 4.552e+01 5.178e+01 9.780e+01, threshold=9.104e+01, percent-clipped=7.0 2023-12-23 12:53:44,067 INFO [train.py:886] (3/4) Epoch 37, batch 100, loss[loss=0.01286, audio_tagging_loss=0.01286, over 25000.00 frames. ], tot_loss[loss=0.01647, audio_tagging_loss=0.01647, over 1976007.81 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:53:44,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-12-23 12:53:45,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1144506.6666666667, ans=0.0 2023-12-23 12:53:58,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-23 12:54:00,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1144573.3333333333, ans=0.125 2023-12-23 12:54:08,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1144640.0, ans=0.0 2023-12-23 12:54:34,841 INFO [train.py:886] (3/4) Epoch 37, batch 150, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01485, audio_tagging_loss=0.01485, over 2639998.16 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:54:44,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2023-12-23 12:55:12,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1145040.0, ans=0.0 2023-12-23 12:55:12,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1145040.0, ans=0.125 2023-12-23 12:55:24,534 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.092e+01 3.546e+01 3.734e+01 3.978e+01 4.632e+01, threshold=7.469e+01, percent-clipped=0.0 2023-12-23 12:55:25,503 INFO [train.py:886] (3/4) Epoch 37, batch 200, loss[loss=0.01218, audio_tagging_loss=0.01218, over 25000.00 frames. ], tot_loss[loss=0.0141, audio_tagging_loss=0.0141, over 3156769.59 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:55:36,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1145240.0, ans=0.125 2023-12-23 12:55:55,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1145373.3333333333, ans=0.5 2023-12-23 12:56:09,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-12-23 12:56:16,882 INFO [train.py:886] (3/4) Epoch 37, batch 250, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01347, audio_tagging_loss=0.01347, over 3558137.52 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:56:18,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1145506.6666666667, ans=0.125 2023-12-23 12:56:25,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1145506.6666666667, ans=0.125 2023-12-23 12:56:25,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1145506.6666666667, ans=0.0 2023-12-23 12:56:26,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2023-12-23 12:56:50,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1145706.6666666667, ans=0.125 2023-12-23 12:56:59,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2023-12-23 12:57:07,245 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.485e+01 3.632e+01 3.797e+01 5.071e+01, threshold=7.264e+01, percent-clipped=0.0 2023-12-23 12:57:08,194 INFO [train.py:886] (3/4) Epoch 37, batch 300, loss[loss=0.0109, audio_tagging_loss=0.0109, over 24750.00 frames. ], tot_loss[loss=0.01315, audio_tagging_loss=0.01315, over 3867715.35 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:57:34,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1145973.3333333333, ans=0.0 2023-12-23 12:57:52,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-12-23 12:57:55,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1146106.6666666667, ans=0.2 2023-12-23 12:57:59,487 INFO [train.py:886] (3/4) Epoch 37, batch 350, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 4098898.37 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:01,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-12-23 12:58:14,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1146240.0, ans=0.125 2023-12-23 12:58:23,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1146306.6666666667, ans=0.125 2023-12-23 12:58:29,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=22.5 2023-12-23 12:58:36,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1146373.3333333333, ans=0.125 2023-12-23 12:58:49,555 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.203e+01 3.540e+01 3.697e+01 3.881e+01 4.233e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 12:58:50,507 INFO [train.py:886] (3/4) Epoch 37, batch 400, loss[loss=0.009995, audio_tagging_loss=0.009995, over 23991.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4283449.99 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:58:53,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1146506.6666666667, ans=0.0 2023-12-23 12:58:57,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1146506.6666666667, ans=0.125 2023-12-23 12:59:02,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1146573.3333333333, ans=0.125 2023-12-23 12:59:06,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-12-23 12:59:19,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146640.0, ans=0.1 2023-12-23 12:59:20,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1146640.0, ans=0.0 2023-12-23 12:59:23,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1146706.6666666667, ans=0.2 2023-12-23 12:59:28,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1146706.6666666667, ans=0.95 2023-12-23 12:59:33,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=22.5 2023-12-23 12:59:43,057 INFO [train.py:886] (3/4) Epoch 37, batch 450, loss[loss=0.01082, audio_tagging_loss=0.01082, over 25000.00 frames. ], tot_loss[loss=0.01236, audio_tagging_loss=0.01236, over 4433151.01 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 12:59:44,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1146840.0, ans=0.2 2023-12-23 13:00:05,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1146973.3333333333, ans=0.125 2023-12-23 13:00:17,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1147040.0, ans=0.125 2023-12-23 13:00:33,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1147106.6666666667, ans=0.125 2023-12-23 13:00:34,449 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.542e+01 3.669e+01 3.831e+01 4.789e+01, threshold=7.338e+01, percent-clipped=0.0 2023-12-23 13:00:35,450 INFO [train.py:886] (3/4) Epoch 37, batch 500, loss[loss=0.01664, audio_tagging_loss=0.01664, over 25000.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 4556276.19 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:00:46,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-12-23 13:00:48,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1147240.0, ans=0.125 2023-12-23 13:00:54,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1147240.0, ans=0.0 2023-12-23 13:01:02,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1147306.6666666667, ans=0.0 2023-12-23 13:01:27,934 INFO [train.py:886] (3/4) Epoch 37, batch 550, loss[loss=0.01056, audio_tagging_loss=0.01056, over 25000.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4646612.94 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:01:36,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1147573.3333333333, ans=0.025 2023-12-23 13:01:44,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1147573.3333333333, ans=0.1 2023-12-23 13:02:17,065 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.550e+01 3.713e+01 3.833e+01 4.389e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 13:02:18,064 INFO [train.py:886] (3/4) Epoch 37, batch 600, loss[loss=0.01393, audio_tagging_loss=0.01393, over 25000.00 frames. ], tot_loss[loss=0.01211, audio_tagging_loss=0.01211, over 4712471.68 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:02:21,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1147840.0, ans=0.125 2023-12-23 13:02:28,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1147906.6666666667, ans=0.125 2023-12-23 13:02:39,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1147973.3333333333, ans=0.0 2023-12-23 13:03:10,619 INFO [train.py:886] (3/4) Epoch 37, batch 650, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4755833.55 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:03:11,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2023-12-23 13:03:28,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1148240.0, ans=0.125 2023-12-23 13:03:43,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1148373.3333333333, ans=0.125 2023-12-23 13:03:44,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-12-23 13:03:49,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1148373.3333333333, ans=0.125 2023-12-23 13:03:54,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1148440.0, ans=0.2 2023-12-23 13:04:00,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.542e+01 3.681e+01 3.830e+01 5.017e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:04:01,830 INFO [train.py:886] (3/4) Epoch 37, batch 700, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24750.00 frames. ], tot_loss[loss=0.01208, audio_tagging_loss=0.01208, over 4793402.58 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:21,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1148573.3333333333, ans=0.125 2023-12-23 13:04:22,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-12-23 13:04:29,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1148640.0, ans=0.2 2023-12-23 13:04:30,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-12-23 13:04:42,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1148706.6666666667, ans=0.125 2023-12-23 13:04:50,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1148773.3333333333, ans=0.5 2023-12-23 13:04:54,206 INFO [train.py:886] (3/4) Epoch 37, batch 750, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4833410.94 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:04:59,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1148840.0, ans=0.125 2023-12-23 13:05:13,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=12.0 2023-12-23 13:05:13,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1148906.6666666667, ans=0.0 2023-12-23 13:05:18,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-12-23 13:05:25,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-12-23 13:05:33,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1149040.0, ans=0.2 2023-12-23 13:05:45,670 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.496e+01 3.641e+01 3.857e+01 4.376e+01, threshold=7.282e+01, percent-clipped=0.0 2023-12-23 13:05:46,705 INFO [train.py:886] (3/4) Epoch 37, batch 800, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4866305.73 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:05:56,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1149240.0, ans=0.125 2023-12-23 13:06:12,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1149306.6666666667, ans=0.05 2023-12-23 13:06:23,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1149373.3333333333, ans=0.125 2023-12-23 13:06:23,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.34 vs. limit=15.0 2023-12-23 13:06:27,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1149440.0, ans=0.1 2023-12-23 13:06:30,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1149440.0, ans=0.125 2023-12-23 13:06:30,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1149440.0, ans=0.2 2023-12-23 13:06:35,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1149440.0, ans=0.025 2023-12-23 13:06:38,680 INFO [train.py:886] (3/4) Epoch 37, batch 850, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4883466.24 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:06:41,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1149506.6666666667, ans=0.07 2023-12-23 13:06:56,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1149573.3333333333, ans=0.125 2023-12-23 13:07:01,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1149640.0, ans=0.0 2023-12-23 13:07:29,568 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.137e+01 3.523e+01 3.659e+01 3.806e+01 4.931e+01, threshold=7.318e+01, percent-clipped=0.0 2023-12-23 13:07:29,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1149840.0, ans=0.025 2023-12-23 13:07:30,533 INFO [train.py:886] (3/4) Epoch 37, batch 900, loss[loss=0.01413, audio_tagging_loss=0.01413, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4893077.50 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:07:30,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-23 13:07:32,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1149840.0, ans=0.0 2023-12-23 13:07:50,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1149973.3333333333, ans=0.2 2023-12-23 13:07:56,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1149973.3333333333, ans=0.0 2023-12-23 13:08:08,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1150040.0, ans=0.125 2023-12-23 13:08:13,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1150106.6666666667, ans=0.125 2023-12-23 13:08:14,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1150106.6666666667, ans=0.2 2023-12-23 13:08:15,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1150106.6666666667, ans=0.0 2023-12-23 13:08:16,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2023-12-23 13:08:18,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1150106.6666666667, ans=0.2 2023-12-23 13:08:23,575 INFO [train.py:886] (3/4) Epoch 37, batch 950, loss[loss=0.01597, audio_tagging_loss=0.01597, over 24750.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4896301.48 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:08:23,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1150173.3333333333, ans=0.125 2023-12-23 13:09:14,549 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.245e+01 3.535e+01 3.650e+01 3.854e+01 4.325e+01, threshold=7.300e+01, percent-clipped=0.0 2023-12-23 13:09:15,522 INFO [train.py:886] (3/4) Epoch 37, batch 1000, loss[loss=0.01332, audio_tagging_loss=0.01332, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4904814.29 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:09:17,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1150506.6666666667, ans=0.0 2023-12-23 13:09:17,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-12-23 13:09:49,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-12-23 13:10:07,195 INFO [train.py:886] (3/4) Epoch 37, batch 1050, loss[loss=0.009999, audio_tagging_loss=0.009999, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4915813.27 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:10:14,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1150840.0, ans=10.0 2023-12-23 13:10:37,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2023-12-23 13:10:43,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-12-23 13:10:55,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1151106.6666666667, ans=0.125 2023-12-23 13:10:58,214 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.215e+01 3.461e+01 3.648e+01 3.824e+01 4.846e+01, threshold=7.297e+01, percent-clipped=0.0 2023-12-23 13:10:59,202 INFO [train.py:886] (3/4) Epoch 37, batch 1100, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4929550.51 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:43,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-12-23 13:11:50,166 INFO [train.py:886] (3/4) Epoch 37, batch 1150, loss[loss=0.01396, audio_tagging_loss=0.01396, over 24750.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4941013.79 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:11:53,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1151506.6666666667, ans=0.05 2023-12-23 13:12:15,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2023-12-23 13:12:23,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1151706.6666666667, ans=0.125 2023-12-23 13:12:40,751 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.217e+01 3.496e+01 3.649e+01 3.812e+01 4.517e+01, threshold=7.299e+01, percent-clipped=0.0 2023-12-23 13:12:42,420 INFO [train.py:886] (3/4) Epoch 37, batch 1200, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4947345.44 frames. ], batch size: 99, lr: 2.92e-03, grad_scale: 32.0 2023-12-23 13:12:51,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1151906.6666666667, ans=0.125 2023-12-23 13:12:54,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1151906.6666666667, ans=0.0 2023-12-23 13:13:34,074 INFO [train.py:886] (3/4) Epoch 37, batch 1250, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4944959.40 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:13:42,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1152173.3333333333, ans=0.2 2023-12-23 13:13:53,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1152306.6666666667, ans=0.0 2023-12-23 13:13:54,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-12-23 13:14:04,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1152373.3333333333, ans=0.125 2023-12-23 13:14:24,773 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.559e+01 3.736e+01 3.887e+01 4.435e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:14:25,747 INFO [train.py:886] (3/4) Epoch 37, batch 1300, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4941344.60 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:15:01,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1152706.6666666667, ans=0.0 2023-12-23 13:15:13,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1152773.3333333333, ans=0.0 2023-12-23 13:15:18,419 INFO [train.py:886] (3/4) Epoch 37, batch 1350, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4940917.95 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:15:32,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-23 13:15:37,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1152906.6666666667, ans=0.125 2023-12-23 13:15:45,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1152973.3333333333, ans=0.125 2023-12-23 13:15:49,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1153040.0, ans=0.95 2023-12-23 13:16:01,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1153106.6666666667, ans=0.0 2023-12-23 13:16:07,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2023-12-23 13:16:07,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1153106.6666666667, ans=0.0 2023-12-23 13:16:08,516 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.118e+01 3.523e+01 3.683e+01 3.854e+01 4.444e+01, threshold=7.366e+01, percent-clipped=0.0 2023-12-23 13:16:10,185 INFO [train.py:886] (3/4) Epoch 37, batch 1400, loss[loss=0.0131, audio_tagging_loss=0.0131, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4944144.63 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:16:12,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1153173.3333333333, ans=0.125 2023-12-23 13:16:18,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1153173.3333333333, ans=0.1 2023-12-23 13:16:19,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2023-12-23 13:16:40,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1153373.3333333333, ans=0.0 2023-12-23 13:17:02,927 INFO [train.py:886] (3/4) Epoch 37, batch 1450, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4948431.12 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:17:07,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1153506.6666666667, ans=0.0 2023-12-23 13:17:15,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1153573.3333333333, ans=0.125 2023-12-23 13:17:30,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2023-12-23 13:17:53,072 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.474e+01 3.653e+01 3.847e+01 4.349e+01, threshold=7.306e+01, percent-clipped=0.0 2023-12-23 13:17:54,033 INFO [train.py:886] (3/4) Epoch 37, batch 1500, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4956233.02 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:18:08,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1153906.6666666667, ans=0.125 2023-12-23 13:18:17,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2023-12-23 13:18:24,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1154040.0, ans=0.125 2023-12-23 13:18:30,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1154040.0, ans=0.1 2023-12-23 13:18:40,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1154106.6666666667, ans=0.0 2023-12-23 13:18:43,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1154106.6666666667, ans=0.0 2023-12-23 13:18:46,614 INFO [train.py:886] (3/4) Epoch 37, batch 1550, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4954976.07 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:18:50,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1154173.3333333333, ans=0.0 2023-12-23 13:18:50,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-12-23 13:18:53,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1154173.3333333333, ans=0.125 2023-12-23 13:18:53,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-12-23 13:18:59,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-23 13:19:02,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1154240.0, ans=0.0 2023-12-23 13:19:07,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-12-23 13:19:16,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2023-12-23 13:19:22,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1154373.3333333333, ans=0.125 2023-12-23 13:19:28,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1154440.0, ans=0.0 2023-12-23 13:19:36,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1154440.0, ans=0.05 2023-12-23 13:19:37,588 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.596e+01 3.735e+01 3.939e+01 4.850e+01, threshold=7.470e+01, percent-clipped=0.0 2023-12-23 13:19:39,221 INFO [train.py:886] (3/4) Epoch 37, batch 1600, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01202, audio_tagging_loss=0.01202, over 4950489.04 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:19:54,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1154573.3333333333, ans=0.125 2023-12-23 13:19:57,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1154640.0, ans=0.125 2023-12-23 13:20:08,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-12-23 13:20:13,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1154706.6666666667, ans=0.0 2023-12-23 13:20:16,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1154706.6666666667, ans=0.1 2023-12-23 13:20:29,949 INFO [train.py:886] (3/4) Epoch 37, batch 1650, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01201, audio_tagging_loss=0.01201, over 4945096.30 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:20:32,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1154840.0, ans=0.1 2023-12-23 13:20:39,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1154906.6666666667, ans=0.125 2023-12-23 13:20:51,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1154973.3333333333, ans=0.125 2023-12-23 13:20:51,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2023-12-23 13:21:02,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-12-23 13:21:12,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1155106.6666666667, ans=10.0 2023-12-23 13:21:20,725 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.000e+01 3.491e+01 3.659e+01 3.859e+01 4.664e+01, threshold=7.317e+01, percent-clipped=0.0 2023-12-23 13:21:21,685 INFO [train.py:886] (3/4) Epoch 37, batch 1700, loss[loss=0.01164, audio_tagging_loss=0.01164, over 24750.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4947682.65 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:21:35,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1155240.0, ans=0.0 2023-12-23 13:21:45,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1155306.6666666667, ans=0.0 2023-12-23 13:21:45,765 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:22:09,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-12-23 13:22:12,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2023-12-23 13:22:12,588 INFO [train.py:886] (3/4) Epoch 37, batch 1750, loss[loss=0.01397, audio_tagging_loss=0.01397, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4950088.18 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:22:16,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1155506.6666666667, ans=0.125 2023-12-23 13:22:22,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1155573.3333333333, ans=0.125 2023-12-23 13:22:27,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-12-23 13:22:49,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1155706.6666666667, ans=0.125 2023-12-23 13:22:51,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 13:23:00,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1155773.3333333333, ans=0.1 2023-12-23 13:23:03,626 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.050e+01 3.504e+01 3.681e+01 3.884e+01 4.385e+01, threshold=7.361e+01, percent-clipped=0.0 2023-12-23 13:23:04,633 INFO [train.py:886] (3/4) Epoch 37, batch 1800, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4949158.40 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:23:16,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1155906.6666666667, ans=0.95 2023-12-23 13:23:55,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-12-23 13:23:56,219 INFO [train.py:886] (3/4) Epoch 37, batch 1850, loss[loss=0.01053, audio_tagging_loss=0.01053, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4952617.93 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:23:56,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1156173.3333333333, ans=0.1 2023-12-23 13:24:00,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2023-12-23 13:24:02,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1156173.3333333333, ans=0.95 2023-12-23 13:24:05,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1156240.0, ans=0.0 2023-12-23 13:24:13,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1156240.0, ans=0.0 2023-12-23 13:24:31,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-12-23 13:24:38,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:24:46,826 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.589e+01 3.735e+01 3.882e+01 5.249e+01, threshold=7.471e+01, percent-clipped=0.0 2023-12-23 13:24:47,857 INFO [train.py:886] (3/4) Epoch 37, batch 1900, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4946105.35 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:25:03,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1156573.3333333333, ans=0.05 2023-12-23 13:25:11,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1156640.0, ans=0.125 2023-12-23 13:25:39,106 INFO [train.py:886] (3/4) Epoch 37, batch 1950, loss[loss=0.01011, audio_tagging_loss=0.01011, over 24087.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4940213.15 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2023-12-23 13:25:55,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1156906.6666666667, ans=0.0 2023-12-23 13:26:28,621 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.224e+01 3.487e+01 3.716e+01 3.886e+01 4.600e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:26:30,335 INFO [train.py:886] (3/4) Epoch 37, batch 2000, loss[loss=0.0129, audio_tagging_loss=0.0129, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4941930.29 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:26:40,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1157240.0, ans=0.125 2023-12-23 13:26:48,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2023-12-23 13:26:49,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1157306.6666666667, ans=0.125 2023-12-23 13:26:58,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157306.6666666667, ans=0.1 2023-12-23 13:27:21,485 INFO [train.py:886] (3/4) Epoch 37, batch 2050, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4941380.70 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:27:43,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1157640.0, ans=0.125 2023-12-23 13:27:52,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1157706.6666666667, ans=0.125 2023-12-23 13:27:57,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1157706.6666666667, ans=0.2 2023-12-23 13:28:02,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157773.3333333333, ans=0.1 2023-12-23 13:28:07,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1157773.3333333333, ans=0.125 2023-12-23 13:28:09,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1157773.3333333333, ans=0.125 2023-12-23 13:28:12,077 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.053e+01 3.452e+01 3.576e+01 3.817e+01 4.679e+01, threshold=7.151e+01, percent-clipped=0.0 2023-12-23 13:28:13,070 INFO [train.py:886] (3/4) Epoch 37, batch 2100, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4942743.65 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:28:33,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-12-23 13:28:37,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1157973.3333333333, ans=0.125 2023-12-23 13:28:43,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1158040.0, ans=0.125 2023-12-23 13:28:44,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1158040.0, ans=0.125 2023-12-23 13:28:46,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1158040.0, ans=0.5 2023-12-23 13:29:03,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1158106.6666666667, ans=0.0 2023-12-23 13:29:04,801 INFO [train.py:886] (3/4) Epoch 37, batch 2150, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4951284.71 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:29:32,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1158306.6666666667, ans=0.5 2023-12-23 13:29:38,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-12-23 13:29:52,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1158440.0, ans=0.0 2023-12-23 13:29:54,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1158440.0, ans=0.125 2023-12-23 13:29:55,696 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.073e+01 3.588e+01 3.736e+01 3.896e+01 4.550e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 13:29:56,690 INFO [train.py:886] (3/4) Epoch 37, batch 2200, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4944188.08 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:29:56,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1158506.6666666667, ans=0.2 2023-12-23 13:29:58,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 13:30:49,045 INFO [train.py:886] (3/4) Epoch 37, batch 2250, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4942487.58 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:31:18,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1158973.3333333333, ans=0.125 2023-12-23 13:31:23,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1159040.0, ans=0.125 2023-12-23 13:31:38,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1159106.6666666667, ans=0.1 2023-12-23 13:31:38,834 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.103e+01 3.551e+01 3.685e+01 3.844e+01 4.557e+01, threshold=7.371e+01, percent-clipped=0.0 2023-12-23 13:31:40,554 INFO [train.py:886] (3/4) Epoch 37, batch 2300, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4938665.42 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:31:44,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1159173.3333333333, ans=0.05 2023-12-23 13:32:07,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-12-23 13:32:31,780 INFO [train.py:886] (3/4) Epoch 37, batch 2350, loss[loss=0.01351, audio_tagging_loss=0.01351, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4941670.94 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:32:47,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2023-12-23 13:33:11,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-12-23 13:33:22,665 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.090e+01 3.485e+01 3.642e+01 3.779e+01 4.463e+01, threshold=7.283e+01, percent-clipped=0.0 2023-12-23 13:33:22,892 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:33:23,703 INFO [train.py:886] (3/4) Epoch 37, batch 2400, loss[loss=0.0131, audio_tagging_loss=0.0131, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4940763.05 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 64.0 2023-12-23 13:33:42,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1159973.3333333333, ans=0.1 2023-12-23 13:34:07,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1160106.6666666667, ans=0.2 2023-12-23 13:34:14,803 INFO [train.py:886] (3/4) Epoch 37, batch 2450, loss[loss=0.0136, audio_tagging_loss=0.0136, over 25000.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4943570.29 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:34:17,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-12-23 13:34:26,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-23 13:34:38,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-12-23 13:34:42,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1160306.6666666667, ans=0.125 2023-12-23 13:34:43,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-12-23 13:34:55,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1160440.0, ans=0.5 2023-12-23 13:34:56,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1160440.0, ans=0.2 2023-12-23 13:35:06,193 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.588e+01 3.750e+01 3.881e+01 5.588e+01, threshold=7.500e+01, percent-clipped=0.0 2023-12-23 13:35:07,154 INFO [train.py:886] (3/4) Epoch 37, batch 2500, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4947112.27 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:35:20,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-12-23 13:35:27,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1160640.0, ans=0.2 2023-12-23 13:35:30,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-12-23 13:35:32,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-23 13:35:35,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1160640.0, ans=0.125 2023-12-23 13:35:37,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160706.6666666667, ans=0.1 2023-12-23 13:35:43,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1160706.6666666667, ans=0.1 2023-12-23 13:35:47,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1160773.3333333333, ans=0.1 2023-12-23 13:35:48,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=12.0 2023-12-23 13:35:57,195 INFO [train.py:886] (3/4) Epoch 37, batch 2550, loss[loss=0.01258, audio_tagging_loss=0.01258, over 22213.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4942823.63 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:36:41,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1161106.6666666667, ans=0.125 2023-12-23 13:36:41,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-12-23 13:36:44,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-12-23 13:36:46,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1161106.6666666667, ans=0.125 2023-12-23 13:36:48,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.558e+01 3.722e+01 3.971e+01 4.498e+01, threshold=7.443e+01, percent-clipped=0.0 2023-12-23 13:36:49,155 INFO [train.py:886] (3/4) Epoch 37, batch 2600, loss[loss=0.009994, audio_tagging_loss=0.009994, over 25000.00 frames. ], tot_loss[loss=0.01197, audio_tagging_loss=0.01197, over 4942444.56 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:37:15,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1161306.6666666667, ans=0.125 2023-12-23 13:37:20,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-12-23 13:37:21,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1161373.3333333333, ans=0.125 2023-12-23 13:37:31,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1161440.0, ans=0.125 2023-12-23 13:37:33,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1161440.0, ans=0.2 2023-12-23 13:37:37,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1161440.0, ans=0.125 2023-12-23 13:37:42,241 INFO [train.py:886] (3/4) Epoch 37, batch 2650, loss[loss=0.01362, audio_tagging_loss=0.01362, over 25000.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4945909.66 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:37:46,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1161506.6666666667, ans=0.125 2023-12-23 13:38:00,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1161640.0, ans=0.125 2023-12-23 13:38:31,341 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.155e+01 3.489e+01 3.610e+01 3.806e+01 4.521e+01, threshold=7.219e+01, percent-clipped=0.0 2023-12-23 13:38:32,316 INFO [train.py:886] (3/4) Epoch 37, batch 2700, loss[loss=0.01, audio_tagging_loss=0.01, over 21825.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4946764.35 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:39:01,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1161973.3333333333, ans=0.1 2023-12-23 13:39:12,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.56 vs. limit=22.5 2023-12-23 13:39:25,528 INFO [train.py:886] (3/4) Epoch 37, batch 2750, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4952583.35 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:39:33,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-12-23 13:39:36,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1162240.0, ans=0.1 2023-12-23 13:40:00,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1162373.3333333333, ans=0.125 2023-12-23 13:40:01,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2023-12-23 13:40:04,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1162373.3333333333, ans=0.2 2023-12-23 13:40:08,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1162440.0, ans=0.2 2023-12-23 13:40:15,312 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.537e+01 3.670e+01 3.855e+01 4.195e+01, threshold=7.340e+01, percent-clipped=0.0 2023-12-23 13:40:16,313 INFO [train.py:886] (3/4) Epoch 37, batch 2800, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4956933.98 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:40:21,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-12-23 13:40:26,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2023-12-23 13:40:37,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1162640.0, ans=0.2 2023-12-23 13:40:38,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1162640.0, ans=0.02 2023-12-23 13:40:43,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1162640.0, ans=0.125 2023-12-23 13:40:45,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1162640.0, ans=0.0 2023-12-23 13:40:55,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1162706.6666666667, ans=0.125 2023-12-23 13:41:08,989 INFO [train.py:886] (3/4) Epoch 37, batch 2850, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4944741.78 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:41:18,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1162906.6666666667, ans=0.125 2023-12-23 13:41:21,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1162906.6666666667, ans=0.0 2023-12-23 13:41:32,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-12-23 13:41:45,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1163040.0, ans=0.025 2023-12-23 13:41:58,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1163106.6666666667, ans=0.0 2023-12-23 13:41:59,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1163106.6666666667, ans=0.0 2023-12-23 13:42:00,272 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.241e+01 3.535e+01 3.716e+01 3.872e+01 4.379e+01, threshold=7.432e+01, percent-clipped=0.0 2023-12-23 13:42:01,219 INFO [train.py:886] (3/4) Epoch 37, batch 2900, loss[loss=0.009383, audio_tagging_loss=0.009383, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4939846.25 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:42:02,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1163173.3333333333, ans=0.2 2023-12-23 13:42:03,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1163173.3333333333, ans=0.125 2023-12-23 13:42:06,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1163173.3333333333, ans=0.0 2023-12-23 13:42:16,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1163240.0, ans=0.0 2023-12-23 13:42:33,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1163373.3333333333, ans=0.125 2023-12-23 13:42:35,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1163373.3333333333, ans=0.0 2023-12-23 13:42:52,796 INFO [train.py:886] (3/4) Epoch 37, batch 2950, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4945750.60 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:43:05,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1163573.3333333333, ans=0.125 2023-12-23 13:43:07,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1163573.3333333333, ans=0.0 2023-12-23 13:43:27,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1163706.6666666667, ans=0.125 2023-12-23 13:43:29,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1163706.6666666667, ans=0.125 2023-12-23 13:43:31,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1163706.6666666667, ans=0.1 2023-12-23 13:43:35,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1163773.3333333333, ans=0.0 2023-12-23 13:43:41,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1163773.3333333333, ans=0.2 2023-12-23 13:43:43,105 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.214e+01 3.508e+01 3.692e+01 3.793e+01 4.892e+01, threshold=7.383e+01, percent-clipped=0.0 2023-12-23 13:43:44,835 INFO [train.py:886] (3/4) Epoch 37, batch 3000, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4948304.56 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:43:44,852 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 13:44:05,919 INFO [train.py:917] (3/4) Epoch 37, validation: loss=0.03402, audio_tagging_loss=0.03402, over 3737520.00 frames. 2023-12-23 13:44:05,919 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 13:44:07,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1163840.0, ans=0.125 2023-12-23 13:44:08,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1163840.0, ans=0.04949747468305833 2023-12-23 13:44:12,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2023-12-23 13:44:12,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2023-12-23 13:44:13,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1163840.0, ans=0.0 2023-12-23 13:44:45,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1164040.0, ans=0.1 2023-12-23 13:44:53,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1164106.6666666667, ans=0.1 2023-12-23 13:44:57,775 INFO [train.py:886] (3/4) Epoch 37, batch 3050, loss[loss=0.01192, audio_tagging_loss=0.01192, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4952574.06 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:45:09,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1164240.0, ans=0.2 2023-12-23 13:45:27,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-12-23 13:45:34,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1164373.3333333333, ans=0.2 2023-12-23 13:45:48,485 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.196e+01 3.531e+01 3.694e+01 3.920e+01 4.438e+01, threshold=7.387e+01, percent-clipped=0.0 2023-12-23 13:45:49,438 INFO [train.py:886] (3/4) Epoch 37, batch 3100, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4953985.05 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:46:11,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1164640.0, ans=0.125 2023-12-23 13:46:12,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1164640.0, ans=0.125 2023-12-23 13:46:17,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1164640.0, ans=0.0 2023-12-23 13:46:20,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=12.0 2023-12-23 13:46:33,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2023-12-23 13:46:41,321 INFO [train.py:886] (3/4) Epoch 37, batch 3150, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24943.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950373.53 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:46:56,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1164906.6666666667, ans=0.125 2023-12-23 13:46:59,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1164906.6666666667, ans=0.125 2023-12-23 13:47:01,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1164973.3333333333, ans=0.2 2023-12-23 13:47:16,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1165040.0, ans=10.0 2023-12-23 13:47:32,503 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.567e+01 3.724e+01 3.935e+01 4.508e+01, threshold=7.447e+01, percent-clipped=0.0 2023-12-23 13:47:33,502 INFO [train.py:886] (3/4) Epoch 37, batch 3200, loss[loss=0.009831, audio_tagging_loss=0.009831, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4942784.45 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:47:34,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2023-12-23 13:47:42,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1165240.0, ans=0.125 2023-12-23 13:47:42,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1165240.0, ans=0.05 2023-12-23 13:47:52,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.28 vs. limit=22.5 2023-12-23 13:47:55,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1165306.6666666667, ans=0.0 2023-12-23 13:48:25,161 INFO [train.py:886] (3/4) Epoch 37, batch 3250, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4942672.29 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:48:27,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1165506.6666666667, ans=0.1 2023-12-23 13:48:33,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1165506.6666666667, ans=0.125 2023-12-23 13:48:37,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1165573.3333333333, ans=0.125 2023-12-23 13:48:54,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-12-23 13:49:11,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-12-23 13:49:15,388 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.154e+01 3.509e+01 3.653e+01 3.824e+01 4.316e+01, threshold=7.305e+01, percent-clipped=0.0 2023-12-23 13:49:16,373 INFO [train.py:886] (3/4) Epoch 37, batch 3300, loss[loss=0.00987, audio_tagging_loss=0.00987, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4949794.26 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:49:16,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-23 13:49:56,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1166040.0, ans=0.125 2023-12-23 13:50:00,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1166106.6666666667, ans=0.025 2023-12-23 13:50:07,851 INFO [train.py:886] (3/4) Epoch 37, batch 3350, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4950617.90 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:50:14,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-12-23 13:50:22,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1166240.0, ans=0.1 2023-12-23 13:50:25,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1166240.0, ans=0.1 2023-12-23 13:50:36,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1166306.6666666667, ans=0.125 2023-12-23 13:50:47,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1166373.3333333333, ans=0.2 2023-12-23 13:50:58,582 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.110e+01 3.562e+01 3.721e+01 3.851e+01 4.497e+01, threshold=7.442e+01, percent-clipped=0.0 2023-12-23 13:51:00,227 INFO [train.py:886] (3/4) Epoch 37, batch 3400, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24952.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4957755.84 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:51:09,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1166573.3333333333, ans=0.0 2023-12-23 13:51:20,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1166640.0, ans=0.2 2023-12-23 13:51:25,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1166640.0, ans=0.0 2023-12-23 13:51:31,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1166706.6666666667, ans=0.1 2023-12-23 13:51:38,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-12-23 13:51:50,555 INFO [train.py:886] (3/4) Epoch 37, batch 3450, loss[loss=0.01133, audio_tagging_loss=0.01133, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4959817.67 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:51:58,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1166840.0, ans=0.125 2023-12-23 13:52:17,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1166973.3333333333, ans=0.125 2023-12-23 13:52:17,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1166973.3333333333, ans=0.0 2023-12-23 13:52:18,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1166973.3333333333, ans=10.0 2023-12-23 13:52:33,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-12-23 13:52:41,673 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.580e+01 3.707e+01 3.907e+01 4.394e+01, threshold=7.413e+01, percent-clipped=0.0 2023-12-23 13:52:42,664 INFO [train.py:886] (3/4) Epoch 37, batch 3500, loss[loss=0.009691, audio_tagging_loss=0.009691, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4959076.49 frames. ], batch size: 99, lr: 2.90e-03, grad_scale: 64.0 2023-12-23 13:52:45,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1167173.3333333333, ans=0.125 2023-12-23 13:52:46,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-12-23 13:52:46,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1167173.3333333333, ans=0.0 2023-12-23 13:52:55,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1167240.0, ans=0.125 2023-12-23 13:53:01,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1167240.0, ans=0.125 2023-12-23 13:53:11,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1167306.6666666667, ans=0.125 2023-12-23 13:53:17,220 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.806e-02 2023-12-23 13:53:30,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-12-23 13:53:35,207 INFO [train.py:886] (3/4) Epoch 37, batch 3550, loss[loss=0.01219, audio_tagging_loss=0.01219, over 21764.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4948675.26 frames. ], batch size: 107, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:53:38,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1167506.6666666667, ans=0.1 2023-12-23 13:53:58,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1167640.0, ans=0.125 2023-12-23 13:54:14,613 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:54:19,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1167773.3333333333, ans=0.125 2023-12-23 13:54:26,590 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.024e+01 3.522e+01 3.659e+01 3.853e+01 4.426e+01, threshold=7.319e+01, percent-clipped=0.0 2023-12-23 13:54:26,614 INFO [train.py:886] (3/4) Epoch 37, batch 3600, loss[loss=0.01147, audio_tagging_loss=0.01147, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4952869.62 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2023-12-23 13:54:52,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1167973.3333333333, ans=0.125 2023-12-23 13:54:59,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1168040.0, ans=0.0 2023-12-23 13:55:00,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.75 vs. limit=22.5 2023-12-23 13:55:06,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1168040.0, ans=0.125 2023-12-23 13:55:11,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1168106.6666666667, ans=0.0 2023-12-23 13:55:19,999 INFO [train.py:886] (3/4) Epoch 37, batch 3650, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4955087.43 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:55:23,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1168173.3333333333, ans=0.125 2023-12-23 13:55:33,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1168240.0, ans=0.125 2023-12-23 13:55:57,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-12-23 13:55:58,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1168373.3333333333, ans=0.1 2023-12-23 13:55:58,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1168373.3333333333, ans=0.125 2023-12-23 13:56:11,231 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.231e+01 3.538e+01 3.706e+01 3.858e+01 4.224e+01, threshold=7.412e+01, percent-clipped=0.0 2023-12-23 13:56:11,255 INFO [train.py:886] (3/4) Epoch 37, batch 3700, loss[loss=0.01214, audio_tagging_loss=0.01214, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4959648.13 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:56:13,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1168506.6666666667, ans=0.0 2023-12-23 13:56:22,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1168573.3333333333, ans=0.125 2023-12-23 13:56:44,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1168706.6666666667, ans=0.2 2023-12-23 13:56:58,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1168773.3333333333, ans=0.125 2023-12-23 13:57:02,251 INFO [train.py:886] (3/4) Epoch 37, batch 3750, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24053.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4957162.75 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:57:20,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1168906.6666666667, ans=0.0 2023-12-23 13:57:34,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1169040.0, ans=0.0 2023-12-23 13:57:37,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2023-12-23 13:57:40,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1169040.0, ans=0.0 2023-12-23 13:57:43,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2023-12-23 13:57:44,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1169106.6666666667, ans=0.125 2023-12-23 13:57:45,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-12-23 13:57:47,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-12-23 13:57:50,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1169106.6666666667, ans=0.1 2023-12-23 13:57:53,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169173.3333333333, ans=0.1 2023-12-23 13:57:54,661 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.646e+01 3.784e+01 3.939e+01 4.535e+01, threshold=7.569e+01, percent-clipped=0.0 2023-12-23 13:57:54,696 INFO [train.py:886] (3/4) Epoch 37, batch 3800, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4945771.03 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:58:02,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1169173.3333333333, ans=0.125 2023-12-23 13:58:05,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1169240.0, ans=0.07 2023-12-23 13:58:07,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-12-23 13:58:15,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1169306.6666666667, ans=0.125 2023-12-23 13:58:31,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1169373.3333333333, ans=0.1 2023-12-23 13:58:45,210 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:58:45,924 INFO [train.py:886] (3/4) Epoch 37, batch 3850, loss[loss=0.01276, audio_tagging_loss=0.01276, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4945188.51 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:58:48,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1169506.6666666667, ans=0.1 2023-12-23 13:59:01,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1169573.3333333333, ans=0.125 2023-12-23 13:59:11,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1169640.0, ans=0.125 2023-12-23 13:59:19,296 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 13:59:21,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2023-12-23 13:59:23,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1169706.6666666667, ans=10.0 2023-12-23 13:59:39,050 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.134e+01 3.574e+01 3.713e+01 3.863e+01 4.498e+01, threshold=7.426e+01, percent-clipped=0.0 2023-12-23 13:59:39,087 INFO [train.py:886] (3/4) Epoch 37, batch 3900, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4941635.52 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 13:59:40,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1169840.0, ans=0.0 2023-12-23 13:59:41,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1169840.0, ans=0.0 2023-12-23 13:59:54,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1169906.6666666667, ans=0.125 2023-12-23 14:00:03,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1169973.3333333333, ans=0.0 2023-12-23 14:00:08,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1170040.0, ans=0.1 2023-12-23 14:00:12,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1170040.0, ans=0.125 2023-12-23 14:00:15,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=22.5 2023-12-23 14:00:17,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1170040.0, ans=0.125 2023-12-23 14:00:29,158 INFO [train.py:886] (3/4) Epoch 37, batch 3950, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4944779.66 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:00:44,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1170240.0, ans=0.0 2023-12-23 14:00:45,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1170240.0, ans=0.125 2023-12-23 14:00:46,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-12-23 14:00:56,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.33 vs. limit=15.0 2023-12-23 14:01:02,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1170373.3333333333, ans=0.2 2023-12-23 14:01:14,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-12-23 14:01:21,652 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.552e+01 3.697e+01 3.830e+01 4.529e+01, threshold=7.395e+01, percent-clipped=0.0 2023-12-23 14:01:21,679 INFO [train.py:886] (3/4) Epoch 37, batch 4000, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4942482.98 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:01:22,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1170506.6666666667, ans=0.1 2023-12-23 14:01:43,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1170640.0, ans=0.1 2023-12-23 14:01:50,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1170640.0, ans=0.1 2023-12-23 14:01:55,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1170706.6666666667, ans=0.0 2023-12-23 14:01:56,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2023-12-23 14:02:00,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1170706.6666666667, ans=0.2 2023-12-23 14:02:08,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-12-23 14:02:10,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2023-12-23 14:02:14,051 INFO [train.py:886] (3/4) Epoch 37, batch 4050, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4946491.53 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:02:17,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1170840.0, ans=0.1 2023-12-23 14:02:28,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-12-23 14:02:32,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-12-23 14:02:44,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1171040.0, ans=0.1 2023-12-23 14:02:45,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-12-23 14:02:45,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1171040.0, ans=0.125 2023-12-23 14:02:57,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1171106.6666666667, ans=0.0 2023-12-23 14:03:06,580 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.265e+01 3.643e+01 3.788e+01 3.914e+01 4.371e+01, threshold=7.577e+01, percent-clipped=0.0 2023-12-23 14:03:06,604 INFO [train.py:886] (3/4) Epoch 37, batch 4100, loss[loss=0.01526, audio_tagging_loss=0.01526, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4938597.36 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:03:39,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1171373.3333333333, ans=0.2 2023-12-23 14:03:44,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1171373.3333333333, ans=0.125 2023-12-23 14:03:59,157 INFO [train.py:886] (3/4) Epoch 37, batch 4150, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4937086.80 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:04:18,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1171640.0, ans=0.125 2023-12-23 14:04:25,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1171640.0, ans=0.0 2023-12-23 14:04:50,303 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.188e+01 3.521e+01 3.688e+01 3.874e+01 5.030e+01, threshold=7.375e+01, percent-clipped=0.0 2023-12-23 14:04:50,327 INFO [train.py:886] (3/4) Epoch 37, batch 4200, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4945222.48 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:04:52,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1171840.0, ans=0.125 2023-12-23 14:05:05,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1171906.6666666667, ans=0.0 2023-12-23 14:05:13,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1171973.3333333333, ans=0.125 2023-12-23 14:05:15,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1171973.3333333333, ans=0.125 2023-12-23 14:05:16,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1171973.3333333333, ans=0.0 2023-12-23 14:05:32,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1172106.6666666667, ans=0.2 2023-12-23 14:05:42,849 INFO [train.py:886] (3/4) Epoch 37, batch 4250, loss[loss=0.01348, audio_tagging_loss=0.01348, over 23037.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4941182.27 frames. ], batch size: 107, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:05:48,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1172173.3333333333, ans=0.0 2023-12-23 14:05:52,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1172240.0, ans=0.2 2023-12-23 14:05:53,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1172240.0, ans=0.125 2023-12-23 14:05:55,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1172240.0, ans=0.0 2023-12-23 14:06:04,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172306.6666666667, ans=0.1 2023-12-23 14:06:06,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1172306.6666666667, ans=0.1 2023-12-23 14:06:25,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1172440.0, ans=0.125 2023-12-23 14:06:32,488 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:06:35,033 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.544e+01 3.699e+01 3.834e+01 4.591e+01, threshold=7.397e+01, percent-clipped=0.0 2023-12-23 14:06:35,058 INFO [train.py:886] (3/4) Epoch 37, batch 4300, loss[loss=0.01007, audio_tagging_loss=0.01007, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4947944.16 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:06:39,064 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:06:50,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1172573.3333333333, ans=10.0 2023-12-23 14:07:05,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1172706.6666666667, ans=0.1 2023-12-23 14:07:17,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1172773.3333333333, ans=0.125 2023-12-23 14:07:26,732 INFO [train.py:886] (3/4) Epoch 37, batch 4350, loss[loss=0.008859, audio_tagging_loss=0.008859, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4948493.70 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:07:29,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1172840.0, ans=0.125 2023-12-23 14:07:36,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1172906.6666666667, ans=0.04949747468305833 2023-12-23 14:07:46,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172973.3333333333, ans=0.1 2023-12-23 14:07:50,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1172973.3333333333, ans=0.125 2023-12-23 14:08:04,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1173040.0, ans=0.1 2023-12-23 14:08:13,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-12-23 14:08:14,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2023-12-23 14:08:18,190 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.237e+01 3.582e+01 3.726e+01 3.912e+01 4.794e+01, threshold=7.453e+01, percent-clipped=0.0 2023-12-23 14:08:18,214 INFO [train.py:886] (3/4) Epoch 37, batch 4400, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4951178.79 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:08:40,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-12-23 14:08:41,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1173306.6666666667, ans=0.125 2023-12-23 14:08:46,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1173306.6666666667, ans=0.0 2023-12-23 14:08:49,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1173306.6666666667, ans=0.2 2023-12-23 14:09:07,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1173440.0, ans=0.125 2023-12-23 14:09:13,029 INFO [train.py:886] (3/4) Epoch 37, batch 4450, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24035.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4950487.88 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:09:17,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1173506.6666666667, ans=0.125 2023-12-23 14:09:17,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1173506.6666666667, ans=0.125 2023-12-23 14:09:24,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1173573.3333333333, ans=0.0 2023-12-23 14:09:33,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1173640.0, ans=0.0 2023-12-23 14:09:34,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1173640.0, ans=0.2 2023-12-23 14:09:42,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-12-23 14:10:02,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1173773.3333333333, ans=0.125 2023-12-23 14:10:05,055 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.567e+01 3.732e+01 3.907e+01 4.248e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 14:10:05,080 INFO [train.py:886] (3/4) Epoch 37, batch 4500, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4953261.72 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:10:16,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1173906.6666666667, ans=0.1 2023-12-23 14:10:16,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1173906.6666666667, ans=0.2 2023-12-23 14:10:21,948 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:10:34,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-23 14:10:38,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1174040.0, ans=0.09899494936611666 2023-12-23 14:10:44,240 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:10:48,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1174106.6666666667, ans=0.0 2023-12-23 14:10:56,692 INFO [train.py:886] (3/4) Epoch 37, batch 4550, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4951089.67 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:11:01,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1174173.3333333333, ans=0.05 2023-12-23 14:11:03,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1174173.3333333333, ans=0.125 2023-12-23 14:11:07,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1174240.0, ans=0.125 2023-12-23 14:11:09,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1174240.0, ans=0.125 2023-12-23 14:11:23,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1174306.6666666667, ans=0.04949747468305833 2023-12-23 14:11:28,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1174373.3333333333, ans=0.0 2023-12-23 14:11:34,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1174373.3333333333, ans=0.125 2023-12-23 14:11:35,256 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:11:42,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1174440.0, ans=0.1 2023-12-23 14:11:47,705 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.143e+01 3.524e+01 3.710e+01 3.904e+01 4.910e+01, threshold=7.420e+01, percent-clipped=0.0 2023-12-23 14:11:47,730 INFO [train.py:886] (3/4) Epoch 37, batch 4600, loss[loss=0.009926, audio_tagging_loss=0.009926, over 24058.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4956147.00 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:12:03,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1174573.3333333333, ans=0.125 2023-12-23 14:12:40,268 INFO [train.py:886] (3/4) Epoch 37, batch 4650, loss[loss=0.01172, audio_tagging_loss=0.01172, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4959912.82 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:12:59,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1174906.6666666667, ans=0.2 2023-12-23 14:13:02,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1174973.3333333333, ans=0.0 2023-12-23 14:13:05,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1174973.3333333333, ans=0.125 2023-12-23 14:13:12,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1175040.0, ans=0.2 2023-12-23 14:13:18,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-12-23 14:13:20,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-23 14:13:24,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1175106.6666666667, ans=0.1 2023-12-23 14:13:31,794 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.187e+01 3.524e+01 3.714e+01 3.861e+01 4.972e+01, threshold=7.428e+01, percent-clipped=0.0 2023-12-23 14:13:31,818 INFO [train.py:886] (3/4) Epoch 37, batch 4700, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4958653.81 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:13:32,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1175173.3333333333, ans=0.125 2023-12-23 14:13:39,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1175173.3333333333, ans=0.035 2023-12-23 14:13:39,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1175173.3333333333, ans=0.125 2023-12-23 14:13:43,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-12-23 14:13:53,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1175306.6666666667, ans=0.0 2023-12-23 14:14:14,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-12-23 14:14:18,217 INFO [train.py:886] (3/4) Epoch 37, batch 4750, loss[loss=0.01242, audio_tagging_loss=0.01242, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4947856.85 frames. ], batch size: 99, lr: 2.89e-03, grad_scale: 32.0 2023-12-23 14:14:19,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2023-12-23 14:14:28,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1175573.3333333333, ans=0.125 2023-12-23 14:14:52,805 INFO [train.py:886] (3/4) Epoch 38, batch 0, loss[loss=0.02586, audio_tagging_loss=0.02586, over 25000.00 frames. ], tot_loss[loss=0.02586, audio_tagging_loss=0.02586, over 25000.00 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:14:52,806 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 14:15:14,004 INFO [train.py:917] (3/4) Epoch 38, validation: loss=0.03366, audio_tagging_loss=0.03366, over 3737520.00 frames. 2023-12-23 14:15:14,005 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 14:15:17,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:18,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1175613.3333333333, ans=0.125 2023-12-23 14:15:23,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1175680.0, ans=0.0 2023-12-23 14:15:30,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2023-12-23 14:15:34,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1175746.6666666667, ans=0.2 2023-12-23 14:15:37,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1175746.6666666667, ans=0.0 2023-12-23 14:15:48,237 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.219e+01 3.729e+01 3.996e+01 5.182e+01 1.024e+02, threshold=7.991e+01, percent-clipped=5.0 2023-12-23 14:16:06,242 INFO [train.py:886] (3/4) Epoch 38, batch 50, loss[loss=0.01612, audio_tagging_loss=0.01612, over 25000.00 frames. ], tot_loss[loss=0.01875, audio_tagging_loss=0.01875, over 1117146.25 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:16:07,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1175946.6666666667, ans=0.125 2023-12-23 14:16:18,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1176013.3333333333, ans=0.125 2023-12-23 14:16:19,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176013.3333333333, ans=0.1 2023-12-23 14:16:25,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1176080.0, ans=0.0 2023-12-23 14:16:39,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1176146.6666666667, ans=0.2 2023-12-23 14:16:45,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1176146.6666666667, ans=0.0 2023-12-23 14:16:48,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176213.3333333333, ans=0.1 2023-12-23 14:16:57,335 INFO [train.py:886] (3/4) Epoch 38, batch 100, loss[loss=0.01516, audio_tagging_loss=0.01516, over 25000.00 frames. ], tot_loss[loss=0.0162, audio_tagging_loss=0.0162, over 1969220.90 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:17:32,541 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.959e+01 4.145e+01 4.396e+01 5.235e+01, threshold=8.289e+01, percent-clipped=0.0 2023-12-23 14:17:49,836 INFO [train.py:886] (3/4) Epoch 38, batch 150, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01476, audio_tagging_loss=0.01476, over 2633573.65 frames. ], batch size: 100, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:17:58,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-12-23 14:18:14,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1176746.6666666667, ans=0.125 2023-12-23 14:18:19,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1176746.6666666667, ans=0.1 2023-12-23 14:18:20,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1176813.3333333333, ans=0.125 2023-12-23 14:18:22,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.66 vs. limit=10.0 2023-12-23 14:18:27,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-12-23 14:18:41,136 INFO [train.py:886] (3/4) Epoch 38, batch 200, loss[loss=0.01364, audio_tagging_loss=0.01364, over 24750.00 frames. ], tot_loss[loss=0.01391, audio_tagging_loss=0.01391, over 3152853.31 frames. ], batch size: 99, lr: 2.85e-03, grad_scale: 32.0 2023-12-23 14:19:01,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-12-23 14:19:08,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1177080.0, ans=0.125 2023-12-23 14:19:16,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.567e+01 3.766e+01 3.950e+01 4.355e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:19:25,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1177213.3333333333, ans=0.125 2023-12-23 14:19:32,664 INFO [train.py:886] (3/4) Epoch 38, batch 250, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01343, audio_tagging_loss=0.01343, over 3556012.56 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:19:54,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-23 14:20:02,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1177413.3333333333, ans=0.0 2023-12-23 14:20:09,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1177480.0, ans=0.1 2023-12-23 14:20:14,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-12-23 14:20:24,870 INFO [train.py:886] (3/4) Epoch 38, batch 300, loss[loss=0.01346, audio_tagging_loss=0.01346, over 24750.00 frames. ], tot_loss[loss=0.01318, audio_tagging_loss=0.01318, over 3863809.10 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:20:39,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1177680.0, ans=0.0 2023-12-23 14:20:59,860 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.565e+01 3.753e+01 3.870e+01 4.486e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 14:21:00,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1177813.3333333333, ans=0.125 2023-12-23 14:21:06,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1177880.0, ans=0.2 2023-12-23 14:21:15,797 INFO [train.py:886] (3/4) Epoch 38, batch 350, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01293, audio_tagging_loss=0.01293, over 4099537.91 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:21:18,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-23 14:21:49,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1178146.6666666667, ans=0.2 2023-12-23 14:21:55,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2023-12-23 14:22:08,458 INFO [train.py:886] (3/4) Epoch 38, batch 400, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01257, audio_tagging_loss=0.01257, over 4290278.86 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:22:10,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1178280.0, ans=0.125 2023-12-23 14:22:12,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-12-23 14:22:13,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1178280.0, ans=0.0 2023-12-23 14:22:13,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1178280.0, ans=0.0 2023-12-23 14:22:14,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1178280.0, ans=0.1 2023-12-23 14:22:16,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1178280.0, ans=0.0 2023-12-23 14:22:21,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1178346.6666666667, ans=0.125 2023-12-23 14:22:25,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1178346.6666666667, ans=0.125 2023-12-23 14:22:33,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1178413.3333333333, ans=0.1 2023-12-23 14:22:43,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1178480.0, ans=0.1 2023-12-23 14:22:43,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.271e+01 3.551e+01 3.719e+01 3.910e+01 4.441e+01, threshold=7.437e+01, percent-clipped=0.0 2023-12-23 14:22:46,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1178480.0, ans=0.125 2023-12-23 14:23:01,091 INFO [train.py:886] (3/4) Epoch 38, batch 450, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 4440428.22 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:12,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1178680.0, ans=0.05 2023-12-23 14:23:15,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1178680.0, ans=0.1 2023-12-23 14:23:18,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-12-23 14:23:27,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1178746.6666666667, ans=0.1 2023-12-23 14:23:31,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-12-23 14:23:42,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1178880.0, ans=0.0 2023-12-23 14:23:42,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1178880.0, ans=0.125 2023-12-23 14:23:52,214 INFO [train.py:886] (3/4) Epoch 38, batch 500, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4555209.21 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:23:52,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2023-12-23 14:23:55,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1178946.6666666667, ans=0.1 2023-12-23 14:24:15,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1179080.0, ans=0.125 2023-12-23 14:24:23,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2023-12-23 14:24:27,113 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.290e+01 3.549e+01 3.713e+01 3.859e+01 4.188e+01, threshold=7.427e+01, percent-clipped=0.0 2023-12-23 14:24:30,055 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:24:35,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179213.3333333333, ans=0.1 2023-12-23 14:24:44,562 INFO [train.py:886] (3/4) Epoch 38, batch 550, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4645869.95 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:25:00,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1179346.6666666667, ans=0.2 2023-12-23 14:25:03,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1179346.6666666667, ans=0.125 2023-12-23 14:25:18,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1179480.0, ans=0.0 2023-12-23 14:25:29,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179546.6666666667, ans=0.1 2023-12-23 14:25:36,032 INFO [train.py:886] (3/4) Epoch 38, batch 600, loss[loss=0.0145, audio_tagging_loss=0.0145, over 24750.00 frames. ], tot_loss[loss=0.01204, audio_tagging_loss=0.01204, over 4709938.02 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:25:45,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179613.3333333333, ans=0.1 2023-12-23 14:25:49,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1179680.0, ans=22.5 2023-12-23 14:25:51,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1179680.0, ans=0.1 2023-12-23 14:25:52,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-12-23 14:26:00,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-12-23 14:26:02,193 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.217e-02 2023-12-23 14:26:03,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1179746.6666666667, ans=0.0 2023-12-23 14:26:11,723 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.162e+01 3.607e+01 3.767e+01 3.907e+01 4.380e+01, threshold=7.534e+01, percent-clipped=0.0 2023-12-23 14:26:17,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1179880.0, ans=0.125 2023-12-23 14:26:28,541 INFO [train.py:886] (3/4) Epoch 38, batch 650, loss[loss=0.008827, audio_tagging_loss=0.008827, over 24750.00 frames. ], tot_loss[loss=0.01205, audio_tagging_loss=0.01205, over 4759337.17 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:26:36,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=22.5 2023-12-23 14:27:03,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1180146.6666666667, ans=0.1 2023-12-23 14:27:20,547 INFO [train.py:886] (3/4) Epoch 38, batch 700, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4798773.48 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2023-12-23 14:27:27,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1180280.0, ans=0.2 2023-12-23 14:27:41,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1180413.3333333333, ans=0.0 2023-12-23 14:27:44,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-12-23 14:27:45,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1180413.3333333333, ans=0.0 2023-12-23 14:27:55,670 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.550e+01 3.700e+01 3.852e+01 4.697e+01, threshold=7.400e+01, percent-clipped=0.0 2023-12-23 14:28:07,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1180546.6666666667, ans=0.07 2023-12-23 14:28:11,523 INFO [train.py:886] (3/4) Epoch 38, batch 750, loss[loss=0.01014, audio_tagging_loss=0.01014, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4828987.28 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:28:16,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1180613.3333333333, ans=0.125 2023-12-23 14:28:25,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1180680.0, ans=0.125 2023-12-23 14:28:32,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1180746.6666666667, ans=0.1 2023-12-23 14:28:34,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1180746.6666666667, ans=0.0 2023-12-23 14:28:48,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1180813.3333333333, ans=0.125 2023-12-23 14:28:52,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1180813.3333333333, ans=0.0 2023-12-23 14:28:55,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1180880.0, ans=0.125 2023-12-23 14:29:00,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2023-12-23 14:29:03,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2023-12-23 14:29:04,157 INFO [train.py:886] (3/4) Epoch 38, batch 800, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4855301.29 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:29:04,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1180946.6666666667, ans=0.0 2023-12-23 14:29:10,706 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:29:14,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1181013.3333333333, ans=0.0 2023-12-23 14:29:16,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1181013.3333333333, ans=0.125 2023-12-23 14:29:25,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1181080.0, ans=0.0 2023-12-23 14:29:32,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-12-23 14:29:37,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1181146.6666666667, ans=0.1 2023-12-23 14:29:39,326 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.509e+01 3.712e+01 3.867e+01 4.445e+01, threshold=7.423e+01, percent-clipped=0.0 2023-12-23 14:29:49,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1181213.3333333333, ans=0.125 2023-12-23 14:29:51,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1181213.3333333333, ans=0.0 2023-12-23 14:29:55,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1181280.0, ans=0.035 2023-12-23 14:29:55,988 INFO [train.py:886] (3/4) Epoch 38, batch 850, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4876895.26 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:30:11,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1181346.6666666667, ans=0.09899494936611666 2023-12-23 14:30:17,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-12-23 14:30:36,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1181480.0, ans=0.0 2023-12-23 14:30:47,860 INFO [train.py:886] (3/4) Epoch 38, batch 900, loss[loss=0.01136, audio_tagging_loss=0.01136, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4900983.63 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:31:14,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1181746.6666666667, ans=0.125 2023-12-23 14:31:18,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2023-12-23 14:31:22,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1181813.3333333333, ans=0.125 2023-12-23 14:31:23,124 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.323e+01 3.602e+01 3.689e+01 3.922e+01 4.328e+01, threshold=7.378e+01, percent-clipped=0.0 2023-12-23 14:31:41,094 INFO [train.py:886] (3/4) Epoch 38, batch 950, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01191, audio_tagging_loss=0.01191, over 4906092.41 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:31:57,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2023-12-23 14:32:07,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-12-23 14:32:11,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1182146.6666666667, ans=0.2 2023-12-23 14:32:32,462 INFO [train.py:886] (3/4) Epoch 38, batch 1000, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4914312.76 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:32:50,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1182346.6666666667, ans=0.07 2023-12-23 14:32:53,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1182413.3333333333, ans=0.125 2023-12-23 14:32:54,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1182413.3333333333, ans=0.125 2023-12-23 14:32:56,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1182413.3333333333, ans=0.2 2023-12-23 14:33:00,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1182413.3333333333, ans=0.125 2023-12-23 14:33:05,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1182480.0, ans=0.0 2023-12-23 14:33:06,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1182480.0, ans=0.125 2023-12-23 14:33:07,751 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.581e+01 3.724e+01 3.913e+01 4.572e+01, threshold=7.448e+01, percent-clipped=0.0 2023-12-23 14:33:16,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1182546.6666666667, ans=0.2 2023-12-23 14:33:24,482 INFO [train.py:886] (3/4) Epoch 38, batch 1050, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4923058.68 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:33:45,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1182746.6666666667, ans=0.5 2023-12-23 14:33:57,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-12-23 14:34:04,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1182880.0, ans=0.125 2023-12-23 14:34:16,318 INFO [train.py:886] (3/4) Epoch 38, batch 1100, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4935641.25 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:34:39,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1183080.0, ans=0.0 2023-12-23 14:34:52,072 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.541e+01 3.703e+01 3.876e+01 4.507e+01, threshold=7.407e+01, percent-clipped=0.0 2023-12-23 14:34:59,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1183213.3333333333, ans=0.0 2023-12-23 14:34:59,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-12-23 14:35:00,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1183213.3333333333, ans=0.0 2023-12-23 14:35:07,327 INFO [train.py:886] (3/4) Epoch 38, batch 1150, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4933876.09 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:00,169 INFO [train.py:886] (3/4) Epoch 38, batch 1200, loss[loss=0.01381, audio_tagging_loss=0.01381, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4940384.21 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:36:05,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2023-12-23 14:36:15,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1183680.0, ans=0.1 2023-12-23 14:36:32,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1183813.3333333333, ans=0.125 2023-12-23 14:36:35,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.644e+01 3.844e+01 4.002e+01 4.470e+01, threshold=7.688e+01, percent-clipped=0.0 2023-12-23 14:36:37,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2023-12-23 14:36:38,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1183813.3333333333, ans=0.125 2023-12-23 14:36:39,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-12-23 14:36:45,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1183880.0, ans=0.0 2023-12-23 14:36:51,366 INFO [train.py:886] (3/4) Epoch 38, batch 1250, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4938299.16 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:37:01,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1183946.6666666667, ans=0.1 2023-12-23 14:37:01,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1183946.6666666667, ans=0.0 2023-12-23 14:37:21,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1184146.6666666667, ans=0.125 2023-12-23 14:37:32,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1184146.6666666667, ans=0.125 2023-12-23 14:37:41,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1184213.3333333333, ans=0.1 2023-12-23 14:37:43,629 INFO [train.py:886] (3/4) Epoch 38, batch 1300, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4939027.59 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:37:55,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2023-12-23 14:38:04,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1184413.3333333333, ans=0.1 2023-12-23 14:38:16,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-12-23 14:38:18,639 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.655e+01 3.794e+01 4.002e+01 4.408e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 14:38:20,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1184480.0, ans=0.125 2023-12-23 14:38:35,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1184613.3333333333, ans=0.1 2023-12-23 14:38:35,821 INFO [train.py:886] (3/4) Epoch 38, batch 1350, loss[loss=0.009072, audio_tagging_loss=0.009072, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4942637.76 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:38:59,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1184746.6666666667, ans=0.1 2023-12-23 14:39:20,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1184880.0, ans=0.05 2023-12-23 14:39:24,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1184880.0, ans=0.07 2023-12-23 14:39:26,138 INFO [train.py:886] (3/4) Epoch 38, batch 1400, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4949284.77 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:39:35,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2023-12-23 14:39:47,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1185080.0, ans=0.0 2023-12-23 14:39:48,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-12-23 14:39:54,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1185080.0, ans=0.1 2023-12-23 14:40:01,442 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.205e+01 3.485e+01 3.664e+01 3.812e+01 4.482e+01, threshold=7.328e+01, percent-clipped=0.0 2023-12-23 14:40:11,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1185213.3333333333, ans=0.0 2023-12-23 14:40:13,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185213.3333333333, ans=0.1 2023-12-23 14:40:18,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1185280.0, ans=0.1 2023-12-23 14:40:19,458 INFO [train.py:886] (3/4) Epoch 38, batch 1450, loss[loss=0.009785, audio_tagging_loss=0.009785, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4958420.29 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 64.0 2023-12-23 14:40:44,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1185413.3333333333, ans=0.025 2023-12-23 14:40:51,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1185480.0, ans=0.125 2023-12-23 14:40:52,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1185480.0, ans=0.05 2023-12-23 14:40:58,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1185480.0, ans=0.2 2023-12-23 14:40:59,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1185546.6666666667, ans=0.125 2023-12-23 14:41:03,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-12-23 14:41:10,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1185613.3333333333, ans=0.125 2023-12-23 14:41:11,595 INFO [train.py:886] (3/4) Epoch 38, batch 1500, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4955391.92 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:41:11,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-12-23 14:41:26,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1185680.0, ans=0.0 2023-12-23 14:41:30,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1185746.6666666667, ans=0.1 2023-12-23 14:41:33,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1185746.6666666667, ans=0.5 2023-12-23 14:41:47,043 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.260e+01 3.559e+01 3.727e+01 3.866e+01 4.257e+01, threshold=7.454e+01, percent-clipped=0.0 2023-12-23 14:41:59,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1185880.0, ans=0.125 2023-12-23 14:41:59,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1185880.0, ans=0.125 2023-12-23 14:42:03,332 INFO [train.py:886] (3/4) Epoch 38, batch 1550, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4958197.61 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:42:06,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-12-23 14:42:09,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1185946.6666666667, ans=0.125 2023-12-23 14:42:19,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=12.0 2023-12-23 14:42:19,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1186013.3333333333, ans=0.2 2023-12-23 14:42:20,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1186013.3333333333, ans=0.07 2023-12-23 14:42:55,876 INFO [train.py:886] (3/4) Epoch 38, batch 1600, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4945873.38 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:30,928 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.275e+01 3.656e+01 3.741e+01 3.949e+01 4.800e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 14:43:42,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-12-23 14:43:44,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1186546.6666666667, ans=0.125 2023-12-23 14:43:47,038 INFO [train.py:886] (3/4) Epoch 38, batch 1650, loss[loss=0.0135, audio_tagging_loss=0.0135, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4945156.55 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:43:57,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-12-23 14:43:59,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1186680.0, ans=0.0 2023-12-23 14:44:00,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1186680.0, ans=0.09899494936611666 2023-12-23 14:44:17,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1186746.6666666667, ans=22.5 2023-12-23 14:44:40,021 INFO [train.py:886] (3/4) Epoch 38, batch 1700, loss[loss=0.01235, audio_tagging_loss=0.01235, over 24750.00 frames. ], tot_loss[loss=0.01182, audio_tagging_loss=0.01182, over 4947141.62 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:44:46,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1186946.6666666667, ans=0.0 2023-12-23 14:45:07,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1187080.0, ans=0.125 2023-12-23 14:45:13,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1187146.6666666667, ans=0.0 2023-12-23 14:45:15,255 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.164e+01 3.542e+01 3.699e+01 3.849e+01 4.948e+01, threshold=7.398e+01, percent-clipped=0.0 2023-12-23 14:45:23,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1187213.3333333333, ans=0.125 2023-12-23 14:45:25,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1187213.3333333333, ans=0.125 2023-12-23 14:45:26,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1187213.3333333333, ans=0.0 2023-12-23 14:45:32,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1187280.0, ans=0.125 2023-12-23 14:45:32,636 INFO [train.py:886] (3/4) Epoch 38, batch 1750, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4945481.14 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:45:32,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1187280.0, ans=0.025 2023-12-23 14:45:33,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-23 14:45:55,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1187413.3333333333, ans=0.0 2023-12-23 14:46:17,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1187546.6666666667, ans=0.125 2023-12-23 14:46:18,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1187546.6666666667, ans=0.125 2023-12-23 14:46:23,373 INFO [train.py:886] (3/4) Epoch 38, batch 1800, loss[loss=0.01378, audio_tagging_loss=0.01378, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4949266.55 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:46:27,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-12-23 14:46:30,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1187613.3333333333, ans=0.0 2023-12-23 14:46:41,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1187680.0, ans=0.0 2023-12-23 14:46:53,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-12-23 14:46:58,332 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.277e+01 3.639e+01 3.766e+01 3.892e+01 4.518e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 14:47:00,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1187813.3333333333, ans=0.1 2023-12-23 14:47:05,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2023-12-23 14:47:06,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187880.0, ans=0.1 2023-12-23 14:47:15,567 INFO [train.py:886] (3/4) Epoch 38, batch 1850, loss[loss=0.01023, audio_tagging_loss=0.01023, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4949520.99 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:47:29,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-12-23 14:47:31,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1188013.3333333333, ans=0.125 2023-12-23 14:47:40,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-12-23 14:47:42,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1188080.0, ans=0.0 2023-12-23 14:47:55,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1188213.3333333333, ans=0.025 2023-12-23 14:48:00,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1188213.3333333333, ans=0.125 2023-12-23 14:48:05,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1188213.3333333333, ans=0.125 2023-12-23 14:48:07,159 INFO [train.py:886] (3/4) Epoch 38, batch 1900, loss[loss=0.01196, audio_tagging_loss=0.01196, over 24750.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4945893.61 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:48:07,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1188280.0, ans=0.0 2023-12-23 14:48:14,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-12-23 14:48:43,883 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.569e+01 3.756e+01 3.902e+01 4.536e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 14:48:49,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2023-12-23 14:48:59,026 INFO [train.py:886] (3/4) Epoch 38, batch 1950, loss[loss=0.01303, audio_tagging_loss=0.01303, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4947373.86 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:49:16,212 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.356e-02 2023-12-23 14:49:18,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1188680.0, ans=0.125 2023-12-23 14:49:20,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1188746.6666666667, ans=0.125 2023-12-23 14:49:51,564 INFO [train.py:886] (3/4) Epoch 38, batch 2000, loss[loss=0.00997, audio_tagging_loss=0.00997, over 21916.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4950493.88 frames. ], batch size: 107, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:49:57,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1188946.6666666667, ans=0.125 2023-12-23 14:50:06,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1189013.3333333333, ans=0.125 2023-12-23 14:50:10,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1189080.0, ans=0.125 2023-12-23 14:50:26,864 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.948e+01 3.554e+01 3.702e+01 3.907e+01 4.356e+01, threshold=7.404e+01, percent-clipped=0.0 2023-12-23 14:50:39,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-12-23 14:50:43,015 INFO [train.py:886] (3/4) Epoch 38, batch 2050, loss[loss=0.01127, audio_tagging_loss=0.01127, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4951609.57 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:50:51,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1189280.0, ans=0.1 2023-12-23 14:50:53,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1189280.0, ans=0.0 2023-12-23 14:50:57,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1189346.6666666667, ans=0.125 2023-12-23 14:51:13,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-12-23 14:51:33,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1189546.6666666667, ans=0.015 2023-12-23 14:51:33,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1189546.6666666667, ans=0.125 2023-12-23 14:51:35,118 INFO [train.py:886] (3/4) Epoch 38, batch 2100, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4953242.64 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:51:47,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=1189680.0, ans=12.0 2023-12-23 14:52:09,628 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.222e+01 3.562e+01 3.709e+01 3.873e+01 4.397e+01, threshold=7.419e+01, percent-clipped=0.0 2023-12-23 14:52:25,599 INFO [train.py:886] (3/4) Epoch 38, batch 2150, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4948721.01 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:52:29,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-12-23 14:52:38,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1190013.3333333333, ans=0.125 2023-12-23 14:53:06,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1190213.3333333333, ans=0.125 2023-12-23 14:53:13,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1190213.3333333333, ans=0.125 2023-12-23 14:53:13,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1190213.3333333333, ans=0.0 2023-12-23 14:53:14,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1190213.3333333333, ans=0.125 2023-12-23 14:53:16,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190280.0, ans=0.1 2023-12-23 14:53:17,051 INFO [train.py:886] (3/4) Epoch 38, batch 2200, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4950300.09 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:53:37,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1190346.6666666667, ans=10.0 2023-12-23 14:53:44,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2023-12-23 14:53:51,992 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.641e+01 3.770e+01 3.908e+01 4.334e+01, threshold=7.540e+01, percent-clipped=0.0 2023-12-23 14:54:09,914 INFO [train.py:886] (3/4) Epoch 38, batch 2250, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4951510.50 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:54:14,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1190613.3333333333, ans=0.1 2023-12-23 14:54:35,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1190746.6666666667, ans=0.125 2023-12-23 14:54:47,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1190813.3333333333, ans=0.125 2023-12-23 14:54:51,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1190880.0, ans=0.025 2023-12-23 14:54:54,100 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 14:55:00,347 INFO [train.py:886] (3/4) Epoch 38, batch 2300, loss[loss=0.009002, audio_tagging_loss=0.009002, over 22041.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4948477.60 frames. ], batch size: 107, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:55:07,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1190946.6666666667, ans=0.125 2023-12-23 14:55:12,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1191013.3333333333, ans=0.125 2023-12-23 14:55:14,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1191013.3333333333, ans=0.125 2023-12-23 14:55:20,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1191080.0, ans=0.0 2023-12-23 14:55:21,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1191080.0, ans=0.95 2023-12-23 14:55:27,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1191080.0, ans=0.0 2023-12-23 14:55:35,741 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.145e+01 3.555e+01 3.723e+01 3.863e+01 4.649e+01, threshold=7.446e+01, percent-clipped=0.0 2023-12-23 14:55:43,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-12-23 14:55:49,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1191213.3333333333, ans=0.0 2023-12-23 14:55:52,326 INFO [train.py:886] (3/4) Epoch 38, batch 2350, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4947981.64 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:55:52,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1191280.0, ans=0.125 2023-12-23 14:55:53,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-23 14:55:57,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1191280.0, ans=15.0 2023-12-23 14:56:01,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=12.0 2023-12-23 14:56:04,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1191346.6666666667, ans=0.1 2023-12-23 14:56:11,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1191346.6666666667, ans=0.1 2023-12-23 14:56:14,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1191413.3333333333, ans=0.125 2023-12-23 14:56:15,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1191413.3333333333, ans=0.1 2023-12-23 14:56:15,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1191413.3333333333, ans=0.2 2023-12-23 14:56:24,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2023-12-23 14:56:26,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1191480.0, ans=0.0 2023-12-23 14:56:31,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1191480.0, ans=0.0 2023-12-23 14:56:44,876 INFO [train.py:886] (3/4) Epoch 38, batch 2400, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4956580.03 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:56:48,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1191613.3333333333, ans=0.125 2023-12-23 14:56:50,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-12-23 14:57:03,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-12-23 14:57:03,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-12-23 14:57:20,633 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.189e+01 3.485e+01 3.668e+01 3.843e+01 4.329e+01, threshold=7.336e+01, percent-clipped=0.0 2023-12-23 14:57:36,115 INFO [train.py:886] (3/4) Epoch 38, batch 2450, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.0118, audio_tagging_loss=0.0118, over 4960788.30 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:57:39,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1191946.6666666667, ans=0.125 2023-12-23 14:57:48,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1192013.3333333333, ans=0.5 2023-12-23 14:58:10,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1192146.6666666667, ans=0.125 2023-12-23 14:58:11,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1192146.6666666667, ans=0.125 2023-12-23 14:58:29,279 INFO [train.py:886] (3/4) Epoch 38, batch 2500, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4956100.95 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:58:31,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1192280.0, ans=0.125 2023-12-23 14:58:48,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1192413.3333333333, ans=0.1 2023-12-23 14:58:54,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-12-23 14:59:04,276 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.151e+01 3.642e+01 3.821e+01 3.975e+01 4.588e+01, threshold=7.642e+01, percent-clipped=0.0 2023-12-23 14:59:06,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-12-23 14:59:09,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1192546.6666666667, ans=0.125 2023-12-23 14:59:10,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1192546.6666666667, ans=0.1 2023-12-23 14:59:11,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1192546.6666666667, ans=0.125 2023-12-23 14:59:20,245 INFO [train.py:886] (3/4) Epoch 38, batch 2550, loss[loss=0.0101, audio_tagging_loss=0.0101, over 24750.00 frames. ], tot_loss[loss=0.01194, audio_tagging_loss=0.01194, over 4950504.88 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 14:59:33,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1192680.0, ans=0.05 2023-12-23 15:00:12,990 INFO [train.py:886] (3/4) Epoch 38, batch 2600, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4951726.85 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:00:13,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1192946.6666666667, ans=0.125 2023-12-23 15:00:24,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-12-23 15:00:28,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1193013.3333333333, ans=0.0 2023-12-23 15:00:36,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1193080.0, ans=0.0 2023-12-23 15:00:40,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-12-23 15:00:47,982 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.256e+01 3.573e+01 3.736e+01 3.903e+01 5.523e+01, threshold=7.472e+01, percent-clipped=0.0 2023-12-23 15:00:51,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1193146.6666666667, ans=0.07 2023-12-23 15:00:58,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1193213.3333333333, ans=0.2 2023-12-23 15:01:05,288 INFO [train.py:886] (3/4) Epoch 38, batch 2650, loss[loss=0.01049, audio_tagging_loss=0.01049, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4949678.24 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:01:48,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1193546.6666666667, ans=0.0 2023-12-23 15:01:54,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1193546.6666666667, ans=0.125 2023-12-23 15:01:54,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1193546.6666666667, ans=0.025 2023-12-23 15:01:56,559 INFO [train.py:886] (3/4) Epoch 38, batch 2700, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4952290.74 frames. ], batch size: 99, lr: 2.83e-03, grad_scale: 64.0 2023-12-23 15:01:56,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1193613.3333333333, ans=0.125 2023-12-23 15:02:15,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1193680.0, ans=0.0 2023-12-23 15:02:21,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2023-12-23 15:02:29,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1193813.3333333333, ans=0.04949747468305833 2023-12-23 15:02:30,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1193813.3333333333, ans=0.125 2023-12-23 15:02:32,429 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.575e+01 3.684e+01 3.818e+01 4.506e+01, threshold=7.367e+01, percent-clipped=0.0 2023-12-23 15:02:42,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1193880.0, ans=0.1 2023-12-23 15:02:46,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1193880.0, ans=0.125 2023-12-23 15:02:49,100 INFO [train.py:886] (3/4) Epoch 38, batch 2750, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4958891.30 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:02:55,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2023-12-23 15:03:04,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1194013.3333333333, ans=0.0 2023-12-23 15:03:15,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1194080.0, ans=0.125 2023-12-23 15:03:15,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1194080.0, ans=0.04949747468305833 2023-12-23 15:03:39,921 INFO [train.py:886] (3/4) Epoch 38, batch 2800, loss[loss=0.01223, audio_tagging_loss=0.01223, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4958421.60 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:03:42,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1194280.0, ans=0.125 2023-12-23 15:03:48,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2023-12-23 15:03:54,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2023-12-23 15:04:05,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1194413.3333333333, ans=0.125 2023-12-23 15:04:15,987 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.677e+01 3.839e+01 3.970e+01 4.490e+01, threshold=7.678e+01, percent-clipped=0.0 2023-12-23 15:04:20,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:04:20,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-12-23 15:04:21,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1194546.6666666667, ans=0.125 2023-12-23 15:04:23,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1194546.6666666667, ans=0.125 2023-12-23 15:04:26,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1194546.6666666667, ans=0.0 2023-12-23 15:04:31,047 INFO [train.py:886] (3/4) Epoch 38, batch 2850, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4951076.44 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:04:39,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1194613.3333333333, ans=0.125 2023-12-23 15:05:00,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1194746.6666666667, ans=0.125 2023-12-23 15:05:05,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1194813.3333333333, ans=0.125 2023-12-23 15:05:05,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1194813.3333333333, ans=0.0 2023-12-23 15:05:17,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1194880.0, ans=0.0 2023-12-23 15:05:20,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1194880.0, ans=0.2 2023-12-23 15:05:23,367 INFO [train.py:886] (3/4) Epoch 38, batch 2900, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4945144.21 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:05:58,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.116e+01 3.532e+01 3.702e+01 3.923e+01 4.475e+01, threshold=7.405e+01, percent-clipped=0.0 2023-12-23 15:06:02,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1195213.3333333333, ans=0.125 2023-12-23 15:06:02,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1195213.3333333333, ans=0.0 2023-12-23 15:06:12,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-23 15:06:14,378 INFO [train.py:886] (3/4) Epoch 38, batch 2950, loss[loss=0.01137, audio_tagging_loss=0.01137, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4950329.59 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:06:22,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1195280.0, ans=0.0 2023-12-23 15:06:22,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1195280.0, ans=0.0 2023-12-23 15:06:40,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1195413.3333333333, ans=0.125 2023-12-23 15:06:41,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1195413.3333333333, ans=0.125 2023-12-23 15:06:42,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1195413.3333333333, ans=0.0 2023-12-23 15:07:02,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1195546.6666666667, ans=0.2 2023-12-23 15:07:04,603 INFO [train.py:886] (3/4) Epoch 38, batch 3000, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4950884.46 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:07:04,603 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 15:07:25,410 INFO [train.py:917] (3/4) Epoch 38, validation: loss=0.03488, audio_tagging_loss=0.03488, over 3737520.00 frames. 2023-12-23 15:07:25,410 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 15:07:35,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1195680.0, ans=0.125 2023-12-23 15:07:59,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-12-23 15:08:01,317 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.200e+01 3.533e+01 3.703e+01 3.872e+01 4.647e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:08:02,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1195813.3333333333, ans=0.125 2023-12-23 15:08:06,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-23 15:08:07,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1195880.0, ans=0.125 2023-12-23 15:08:16,281 INFO [train.py:886] (3/4) Epoch 38, batch 3050, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4953234.85 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:08:20,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1195946.6666666667, ans=0.0 2023-12-23 15:08:26,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1196013.3333333333, ans=0.2 2023-12-23 15:08:31,340 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:08:31,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1196013.3333333333, ans=0.0 2023-12-23 15:08:34,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1196013.3333333333, ans=0.125 2023-12-23 15:08:34,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1196013.3333333333, ans=0.0 2023-12-23 15:08:36,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1196080.0, ans=0.125 2023-12-23 15:08:47,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1196146.6666666667, ans=0.125 2023-12-23 15:08:53,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1196146.6666666667, ans=0.0 2023-12-23 15:09:05,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1196213.3333333333, ans=0.1 2023-12-23 15:09:08,026 INFO [train.py:886] (3/4) Epoch 38, batch 3100, loss[loss=0.01203, audio_tagging_loss=0.01203, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4956631.40 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:09:13,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1196280.0, ans=0.0 2023-12-23 15:09:38,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1196480.0, ans=0.0 2023-12-23 15:09:42,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-12-23 15:09:43,190 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+01 3.615e+01 3.756e+01 3.923e+01 4.234e+01, threshold=7.513e+01, percent-clipped=0.0 2023-12-23 15:09:58,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1196613.3333333333, ans=0.0 2023-12-23 15:10:00,168 INFO [train.py:886] (3/4) Epoch 38, batch 3150, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4943340.39 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:10:01,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1196613.3333333333, ans=0.0 2023-12-23 15:10:01,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1196613.3333333333, ans=0.1 2023-12-23 15:10:43,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.75 vs. limit=15.0 2023-12-23 15:10:44,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1196880.0, ans=0.125 2023-12-23 15:10:49,007 INFO [train.py:886] (3/4) Epoch 38, batch 3200, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4937918.57 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:10:51,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1196946.6666666667, ans=0.125 2023-12-23 15:10:53,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1196946.6666666667, ans=0.0 2023-12-23 15:10:57,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1196946.6666666667, ans=0.2 2023-12-23 15:11:14,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1197080.0, ans=0.0 2023-12-23 15:11:23,710 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.251e+01 3.566e+01 3.766e+01 3.939e+01 4.413e+01, threshold=7.532e+01, percent-clipped=0.0 2023-12-23 15:11:40,197 INFO [train.py:886] (3/4) Epoch 38, batch 3250, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4940390.75 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:11:54,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1197346.6666666667, ans=0.2 2023-12-23 15:11:59,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1197413.3333333333, ans=0.125 2023-12-23 15:12:05,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1197413.3333333333, ans=0.0 2023-12-23 15:12:05,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-12-23 15:12:07,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1197413.3333333333, ans=0.0 2023-12-23 15:12:30,370 INFO [train.py:886] (3/4) Epoch 38, batch 3300, loss[loss=0.01111, audio_tagging_loss=0.01111, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4946382.67 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:12:34,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1197613.3333333333, ans=0.0 2023-12-23 15:12:36,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.42 vs. limit=22.5 2023-12-23 15:13:07,158 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.087e+01 3.537e+01 3.698e+01 3.881e+01 4.434e+01, threshold=7.396e+01, percent-clipped=0.0 2023-12-23 15:13:08,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1197813.3333333333, ans=0.0 2023-12-23 15:13:13,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1197880.0, ans=0.07 2023-12-23 15:13:21,885 INFO [train.py:886] (3/4) Epoch 38, batch 3350, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4950501.26 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:13:31,555 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:13:31,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1198013.3333333333, ans=0.125 2023-12-23 15:13:35,188 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:13:42,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1198080.0, ans=0.125 2023-12-23 15:13:54,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1198146.6666666667, ans=0.09899494936611666 2023-12-23 15:14:07,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1198213.3333333333, ans=0.125 2023-12-23 15:14:12,754 INFO [train.py:886] (3/4) Epoch 38, batch 3400, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4949805.89 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:14:17,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1198280.0, ans=0.125 2023-12-23 15:14:31,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1198413.3333333333, ans=0.1 2023-12-23 15:14:36,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1198413.3333333333, ans=0.0 2023-12-23 15:14:45,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1198480.0, ans=0.125 2023-12-23 15:14:48,196 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.674e+01 3.823e+01 3.991e+01 4.333e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 15:14:48,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1198480.0, ans=0.1 2023-12-23 15:14:53,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1198546.6666666667, ans=0.0 2023-12-23 15:14:57,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2023-12-23 15:14:59,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2023-12-23 15:15:02,402 INFO [train.py:886] (3/4) Epoch 38, batch 3450, loss[loss=0.0151, audio_tagging_loss=0.0151, over 24750.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4948484.04 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:15:10,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-12-23 15:15:13,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1198680.0, ans=0.0 2023-12-23 15:15:32,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1198813.3333333333, ans=0.2 2023-12-23 15:15:54,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1198946.6666666667, ans=0.125 2023-12-23 15:15:54,830 INFO [train.py:886] (3/4) Epoch 38, batch 3500, loss[loss=0.008461, audio_tagging_loss=0.008461, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4944378.23 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:15:57,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1198946.6666666667, ans=0.04949747468305833 2023-12-23 15:16:04,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1199013.3333333333, ans=0.1 2023-12-23 15:16:08,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1199013.3333333333, ans=0.07 2023-12-23 15:16:30,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2023-12-23 15:16:30,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.551e+01 3.681e+01 3.822e+01 4.366e+01, threshold=7.362e+01, percent-clipped=0.0 2023-12-23 15:16:32,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1199146.6666666667, ans=0.0 2023-12-23 15:16:44,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1199280.0, ans=0.125 2023-12-23 15:16:45,620 INFO [train.py:886] (3/4) Epoch 38, batch 3550, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4941387.28 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 64.0 2023-12-23 15:16:51,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2023-12-23 15:17:02,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1199346.6666666667, ans=0.2 2023-12-23 15:17:03,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1199346.6666666667, ans=0.0 2023-12-23 15:17:07,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1199413.3333333333, ans=0.07 2023-12-23 15:17:21,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1199480.0, ans=0.0 2023-12-23 15:17:21,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1199480.0, ans=0.2 2023-12-23 15:17:34,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2023-12-23 15:17:37,478 INFO [train.py:886] (3/4) Epoch 38, batch 3600, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4946718.93 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:17:39,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-12-23 15:17:44,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2023-12-23 15:17:46,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1199680.0, ans=0.125 2023-12-23 15:17:54,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-12-23 15:18:06,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1199746.6666666667, ans=0.125 2023-12-23 15:18:08,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1199813.3333333333, ans=0.125 2023-12-23 15:18:14,460 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.519e+01 3.755e+01 3.929e+01 4.573e+01, threshold=7.510e+01, percent-clipped=0.0 2023-12-23 15:18:20,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2023-12-23 15:18:23,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1199880.0, ans=0.125 2023-12-23 15:18:29,738 INFO [train.py:886] (3/4) Epoch 38, batch 3650, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4949246.01 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:18:41,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1200013.3333333333, ans=0.125 2023-12-23 15:18:50,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1200080.0, ans=0.1 2023-12-23 15:19:06,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1200146.6666666667, ans=0.0 2023-12-23 15:19:06,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1200146.6666666667, ans=0.1 2023-12-23 15:19:17,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1200213.3333333333, ans=0.09899494936611666 2023-12-23 15:19:17,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1200213.3333333333, ans=0.2 2023-12-23 15:19:22,997 INFO [train.py:886] (3/4) Epoch 38, batch 3700, loss[loss=0.01194, audio_tagging_loss=0.01194, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4945278.10 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:19:36,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1200346.6666666667, ans=0.125 2023-12-23 15:20:00,557 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.223e+01 3.534e+01 3.731e+01 3.943e+01 4.301e+01, threshold=7.462e+01, percent-clipped=0.0 2023-12-23 15:20:15,138 INFO [train.py:886] (3/4) Epoch 38, batch 3750, loss[loss=0.01474, audio_tagging_loss=0.01474, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4941306.77 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:20:20,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1200613.3333333333, ans=0.125 2023-12-23 15:20:22,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1200613.3333333333, ans=0.0 2023-12-23 15:20:27,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1200680.0, ans=0.0 2023-12-23 15:20:39,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1200746.6666666667, ans=0.125 2023-12-23 15:20:41,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1200746.6666666667, ans=0.125 2023-12-23 15:20:52,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1200813.3333333333, ans=0.125 2023-12-23 15:20:52,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1200813.3333333333, ans=0.1 2023-12-23 15:21:04,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-23 15:21:06,191 INFO [train.py:886] (3/4) Epoch 38, batch 3800, loss[loss=0.01203, audio_tagging_loss=0.01203, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4939617.75 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:21:10,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1200946.6666666667, ans=0.125 2023-12-23 15:21:10,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1200946.6666666667, ans=0.125 2023-12-23 15:21:18,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1201013.3333333333, ans=0.07 2023-12-23 15:21:43,258 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.611e+01 3.778e+01 3.951e+01 5.109e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 15:21:57,323 INFO [train.py:886] (3/4) Epoch 38, batch 3850, loss[loss=0.009867, audio_tagging_loss=0.009867, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4933252.11 frames. ], batch size: 99, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:22:00,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=22.5 2023-12-23 15:22:16,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.42 vs. limit=5.0 2023-12-23 15:22:26,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-12-23 15:22:32,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1201480.0, ans=0.0 2023-12-23 15:22:33,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1201480.0, ans=0.0 2023-12-23 15:22:36,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-12-23 15:22:40,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1201546.6666666667, ans=0.0 2023-12-23 15:22:43,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=12.0 2023-12-23 15:22:49,999 INFO [train.py:886] (3/4) Epoch 38, batch 3900, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4931156.55 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:22:58,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1201613.3333333333, ans=0.125 2023-12-23 15:23:26,946 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.160e+01 3.573e+01 3.725e+01 3.871e+01 4.594e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:23:39,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1201880.0, ans=0.0 2023-12-23 15:23:41,548 INFO [train.py:886] (3/4) Epoch 38, batch 3950, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4936381.64 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 32.0 2023-12-23 15:23:53,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1202013.3333333333, ans=0.0 2023-12-23 15:23:57,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1202013.3333333333, ans=0.125 2023-12-23 15:24:01,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1202080.0, ans=0.0 2023-12-23 15:24:08,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1202080.0, ans=0.125 2023-12-23 15:24:08,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1202080.0, ans=0.125 2023-12-23 15:24:18,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1202146.6666666667, ans=0.125 2023-12-23 15:24:21,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1202146.6666666667, ans=0.0 2023-12-23 15:24:28,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1202213.3333333333, ans=0.0 2023-12-23 15:24:31,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1202213.3333333333, ans=0.0 2023-12-23 15:24:33,224 INFO [train.py:886] (3/4) Epoch 38, batch 4000, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4944520.17 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:24:33,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1202280.0, ans=0.125 2023-12-23 15:24:44,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1202346.6666666667, ans=0.07 2023-12-23 15:25:02,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1202413.3333333333, ans=0.2 2023-12-23 15:25:05,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1202480.0, ans=0.1 2023-12-23 15:25:08,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1202480.0, ans=0.125 2023-12-23 15:25:10,934 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.581e+01 3.725e+01 3.897e+01 4.371e+01, threshold=7.451e+01, percent-clipped=0.0 2023-12-23 15:25:14,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1202546.6666666667, ans=0.0 2023-12-23 15:25:26,262 INFO [train.py:886] (3/4) Epoch 38, batch 4050, loss[loss=0.01264, audio_tagging_loss=0.01264, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4941644.89 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:25:35,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1202680.0, ans=0.125 2023-12-23 15:26:03,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1202813.3333333333, ans=0.125 2023-12-23 15:26:04,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1202813.3333333333, ans=0.0 2023-12-23 15:26:16,825 INFO [train.py:886] (3/4) Epoch 38, batch 4100, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01179, audio_tagging_loss=0.01179, over 4941014.13 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:26:40,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1203080.0, ans=0.125 2023-12-23 15:26:40,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.347e-02 2023-12-23 15:26:54,194 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.608e+01 3.845e+01 3.998e+01 4.535e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 15:27:09,032 INFO [train.py:886] (3/4) Epoch 38, batch 4150, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4943477.03 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:27:12,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1203280.0, ans=0.1 2023-12-23 15:27:26,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-23 15:27:35,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2023-12-23 15:28:01,529 INFO [train.py:886] (3/4) Epoch 38, batch 4200, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4947084.15 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:28:03,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-12-23 15:28:22,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2023-12-23 15:28:36,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1203813.3333333333, ans=0.0 2023-12-23 15:28:39,421 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.229e+01 3.592e+01 3.742e+01 3.876e+01 4.221e+01, threshold=7.484e+01, percent-clipped=0.0 2023-12-23 15:28:42,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1203880.0, ans=0.125 2023-12-23 15:28:52,644 INFO [train.py:886] (3/4) Epoch 38, batch 4250, loss[loss=0.01278, audio_tagging_loss=0.01278, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4944100.32 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:29:03,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1204013.3333333333, ans=0.0 2023-12-23 15:29:12,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1204013.3333333333, ans=0.125 2023-12-23 15:29:15,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1204080.0, ans=0.125 2023-12-23 15:29:17,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1204080.0, ans=0.0 2023-12-23 15:29:38,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1204213.3333333333, ans=0.05 2023-12-23 15:29:38,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1204213.3333333333, ans=0.125 2023-12-23 15:29:45,924 INFO [train.py:886] (3/4) Epoch 38, batch 4300, loss[loss=0.01272, audio_tagging_loss=0.01272, over 25000.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4952399.34 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:30:05,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1204413.3333333333, ans=0.07 2023-12-23 15:30:09,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1204413.3333333333, ans=0.2 2023-12-23 15:30:17,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1204480.0, ans=0.1 2023-12-23 15:30:22,707 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.587e+01 3.741e+01 3.975e+01 4.489e+01, threshold=7.482e+01, percent-clipped=0.0 2023-12-23 15:30:27,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1204546.6666666667, ans=0.2 2023-12-23 15:30:28,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1204546.6666666667, ans=0.1 2023-12-23 15:30:34,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1204546.6666666667, ans=0.2 2023-12-23 15:30:35,953 INFO [train.py:886] (3/4) Epoch 38, batch 4350, loss[loss=0.01227, audio_tagging_loss=0.01227, over 24750.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4958192.73 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:31:14,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1204813.3333333333, ans=0.0 2023-12-23 15:31:28,704 INFO [train.py:886] (3/4) Epoch 38, batch 4400, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4958004.45 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:31:29,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1204946.6666666667, ans=0.2 2023-12-23 15:31:50,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1205080.0, ans=0.025 2023-12-23 15:31:54,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1205080.0, ans=0.125 2023-12-23 15:32:01,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1205146.6666666667, ans=0.07 2023-12-23 15:32:05,843 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.267e+01 3.555e+01 3.761e+01 3.984e+01 4.470e+01, threshold=7.522e+01, percent-clipped=0.0 2023-12-23 15:32:20,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-23 15:32:21,245 INFO [train.py:886] (3/4) Epoch 38, batch 4450, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4953835.54 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:32:21,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1205280.0, ans=0.025 2023-12-23 15:32:34,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1205346.6666666667, ans=0.0 2023-12-23 15:32:41,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1205413.3333333333, ans=0.09899494936611666 2023-12-23 15:32:41,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1205413.3333333333, ans=0.0 2023-12-23 15:32:41,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2023-12-23 15:32:47,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1205413.3333333333, ans=0.125 2023-12-23 15:33:11,828 INFO [train.py:886] (3/4) Epoch 38, batch 4500, loss[loss=0.01066, audio_tagging_loss=0.01066, over 25000.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4953511.51 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:33:12,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1205613.3333333333, ans=10.0 2023-12-23 15:33:14,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1205613.3333333333, ans=0.0 2023-12-23 15:33:22,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2023-12-23 15:33:31,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1205680.0, ans=0.2 2023-12-23 15:33:34,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2023-12-23 15:33:37,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1205746.6666666667, ans=0.1 2023-12-23 15:33:42,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1205813.3333333333, ans=0.2 2023-12-23 15:33:43,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1205813.3333333333, ans=0.125 2023-12-23 15:33:49,547 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.168e+01 3.607e+01 3.784e+01 3.969e+01 5.476e+01, threshold=7.568e+01, percent-clipped=0.0 2023-12-23 15:34:03,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1205880.0, ans=10.0 2023-12-23 15:34:04,991 INFO [train.py:886] (3/4) Epoch 38, batch 4550, loss[loss=0.01089, audio_tagging_loss=0.01089, over 24750.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4950772.53 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:34:12,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1205946.6666666667, ans=0.2 2023-12-23 15:34:13,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1206013.3333333333, ans=0.125 2023-12-23 15:34:18,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1206013.3333333333, ans=10.0 2023-12-23 15:34:20,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1206013.3333333333, ans=0.015 2023-12-23 15:34:40,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-12-23 15:34:55,728 INFO [train.py:886] (3/4) Epoch 38, batch 4600, loss[loss=0.01139, audio_tagging_loss=0.01139, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4959313.98 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:34:55,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1206280.0, ans=0.125 2023-12-23 15:35:18,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1206413.3333333333, ans=0.125 2023-12-23 15:35:22,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1206413.3333333333, ans=0.125 2023-12-23 15:35:32,881 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.218e+01 3.570e+01 3.726e+01 3.915e+01 4.554e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:35:41,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1206546.6666666667, ans=0.0 2023-12-23 15:35:43,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-12-23 15:35:45,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1206613.3333333333, ans=0.0 2023-12-23 15:35:46,159 INFO [train.py:886] (3/4) Epoch 38, batch 4650, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4959797.81 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:35:54,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1206613.3333333333, ans=0.0 2023-12-23 15:35:56,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1206680.0, ans=0.5 2023-12-23 15:35:56,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-12-23 15:35:58,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1206680.0, ans=0.125 2023-12-23 15:35:58,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1206680.0, ans=0.1 2023-12-23 15:36:00,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1206680.0, ans=0.1 2023-12-23 15:36:07,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1206746.6666666667, ans=0.1 2023-12-23 15:36:35,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2023-12-23 15:36:36,155 INFO [train.py:886] (3/4) Epoch 38, batch 4700, loss[loss=0.01202, audio_tagging_loss=0.01202, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4957858.23 frames. ], batch size: 99, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:36:43,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1206946.6666666667, ans=22.5 2023-12-23 15:36:52,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1207013.3333333333, ans=0.1 2023-12-23 15:37:00,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1207080.0, ans=0.2 2023-12-23 15:37:09,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1207146.6666666667, ans=0.0 2023-12-23 15:37:10,714 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.692e+01 3.810e+01 4.007e+01 4.373e+01, threshold=7.619e+01, percent-clipped=0.0 2023-12-23 15:37:10,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1207146.6666666667, ans=0.2 2023-12-23 15:37:12,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2023-12-23 15:37:23,441 INFO [train.py:886] (3/4) Epoch 38, batch 4750, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01183, audio_tagging_loss=0.01183, over 4951904.90 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 32.0 2023-12-23 15:37:27,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1207280.0, ans=0.125 2023-12-23 15:37:32,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1207346.6666666667, ans=0.125 2023-12-23 15:37:57,295 INFO [train.py:886] (3/4) Epoch 39, batch 0, loss[loss=0.026, audio_tagging_loss=0.026, over 25000.00 frames. ], tot_loss[loss=0.026, audio_tagging_loss=0.026, over 25000.00 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:37:57,296 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 15:38:16,927 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4356, 0.2518, 3.3686, 3.4527], device='cuda:3') 2023-12-23 15:38:17,954 INFO [train.py:917] (3/4) Epoch 39, validation: loss=0.03421, audio_tagging_loss=0.03421, over 3737520.00 frames. 2023-12-23 15:38:17,955 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 15:38:23,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-12-23 15:38:30,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1207453.3333333333, ans=0.125 2023-12-23 15:38:51,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1207586.6666666667, ans=0.125 2023-12-23 15:38:56,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-12-23 15:38:58,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1207653.3333333333, ans=0.0 2023-12-23 15:39:10,795 INFO [train.py:886] (3/4) Epoch 39, batch 50, loss[loss=0.01603, audio_tagging_loss=0.01603, over 25000.00 frames. ], tot_loss[loss=0.01849, audio_tagging_loss=0.01849, over 1121651.30 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:39:20,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-12-23 15:39:30,443 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.430e+01 4.003e+01 4.538e+01 5.178e+01 1.091e+02, threshold=9.075e+01, percent-clipped=8.0 2023-12-23 15:39:43,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-12-23 15:39:57,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1207986.6666666667, ans=0.07 2023-12-23 15:40:01,767 INFO [train.py:886] (3/4) Epoch 39, batch 100, loss[loss=0.01466, audio_tagging_loss=0.01466, over 25000.00 frames. ], tot_loss[loss=0.0161, audio_tagging_loss=0.0161, over 1969423.51 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:40:05,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2023-12-23 15:40:09,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2023-12-23 15:40:16,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1208120.0, ans=0.2 2023-12-23 15:40:28,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1208186.6666666667, ans=0.0 2023-12-23 15:40:53,366 INFO [train.py:886] (3/4) Epoch 39, batch 150, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01458, audio_tagging_loss=0.01458, over 2632819.49 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:40:54,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1208386.6666666667, ans=0.125 2023-12-23 15:40:56,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1208386.6666666667, ans=0.0 2023-12-23 15:41:14,325 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.383e+01 3.761e+01 3.990e+01 4.223e+01 5.067e+01, threshold=7.980e+01, percent-clipped=0.0 2023-12-23 15:41:15,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1208520.0, ans=0.125 2023-12-23 15:41:23,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1208586.6666666667, ans=0.0 2023-12-23 15:41:27,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1208586.6666666667, ans=0.95 2023-12-23 15:41:45,786 INFO [train.py:886] (3/4) Epoch 39, batch 200, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24907.00 frames. ], tot_loss[loss=0.0138, audio_tagging_loss=0.0138, over 3155143.51 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:42:00,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1208786.6666666667, ans=0.0 2023-12-23 15:42:04,993 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.594e-02 2023-12-23 15:42:28,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1208986.6666666667, ans=0.5 2023-12-23 15:42:36,419 INFO [train.py:886] (3/4) Epoch 39, batch 250, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.01317, audio_tagging_loss=0.01317, over 3553229.91 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:42:43,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1209053.3333333333, ans=0.125 2023-12-23 15:42:52,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-12-23 15:42:52,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1209120.0, ans=0.1 2023-12-23 15:42:57,483 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.324e+01 3.633e+01 3.790e+01 3.994e+01 4.386e+01, threshold=7.580e+01, percent-clipped=0.0 2023-12-23 15:42:58,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1209186.6666666667, ans=0.125 2023-12-23 15:42:59,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1209186.6666666667, ans=0.0 2023-12-23 15:43:04,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1209186.6666666667, ans=0.125 2023-12-23 15:43:23,550 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:43:28,116 INFO [train.py:886] (3/4) Epoch 39, batch 300, loss[loss=0.01207, audio_tagging_loss=0.01207, over 24750.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 3863226.67 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:43:29,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2023-12-23 15:43:31,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1209386.6666666667, ans=0.1 2023-12-23 15:43:59,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1209586.6666666667, ans=0.0 2023-12-23 15:44:00,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.88 vs. limit=22.5 2023-12-23 15:44:00,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1209586.6666666667, ans=0.125 2023-12-23 15:44:01,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1209586.6666666667, ans=0.125 2023-12-23 15:44:19,126 INFO [train.py:886] (3/4) Epoch 39, batch 350, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01252, audio_tagging_loss=0.01252, over 4092351.79 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:44:23,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1209720.0, ans=0.025 2023-12-23 15:44:27,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1209720.0, ans=0.2 2023-12-23 15:44:28,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1209786.6666666667, ans=0.0 2023-12-23 15:44:39,343 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.641e+01 3.787e+01 3.947e+01 4.798e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 15:44:46,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1209853.3333333333, ans=0.0 2023-12-23 15:44:57,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1209920.0, ans=0.125 2023-12-23 15:44:57,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1209920.0, ans=0.125 2023-12-23 15:45:09,623 INFO [train.py:886] (3/4) Epoch 39, batch 400, loss[loss=0.01371, audio_tagging_loss=0.01371, over 24750.00 frames. ], tot_loss[loss=0.01231, audio_tagging_loss=0.01231, over 4277670.89 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:45:13,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1210053.3333333333, ans=0.2 2023-12-23 15:45:15,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1210053.3333333333, ans=0.5 2023-12-23 15:45:25,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1210120.0, ans=0.125 2023-12-23 15:45:31,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1210186.6666666667, ans=0.1 2023-12-23 15:45:42,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-12-23 15:45:46,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1210253.3333333333, ans=0.125 2023-12-23 15:45:53,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2023-12-23 15:45:58,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1210320.0, ans=0.0 2023-12-23 15:46:00,629 INFO [train.py:886] (3/4) Epoch 39, batch 450, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01206, audio_tagging_loss=0.01206, over 4430762.86 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:46:08,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1210386.6666666667, ans=0.125 2023-12-23 15:46:09,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1210453.3333333333, ans=0.1 2023-12-23 15:46:20,293 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.320e+01 3.614e+01 3.726e+01 3.946e+01 4.381e+01, threshold=7.452e+01, percent-clipped=0.0 2023-12-23 15:46:31,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-12-23 15:46:31,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1210586.6666666667, ans=0.0 2023-12-23 15:46:42,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1210653.3333333333, ans=10.0 2023-12-23 15:46:51,866 INFO [train.py:886] (3/4) Epoch 39, batch 500, loss[loss=0.01512, audio_tagging_loss=0.01512, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4549391.12 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:47:09,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2023-12-23 15:47:15,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1210853.3333333333, ans=0.0 2023-12-23 15:47:15,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1210853.3333333333, ans=0.125 2023-12-23 15:47:19,205 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:47:31,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1210986.6666666667, ans=0.125 2023-12-23 15:47:43,609 INFO [train.py:886] (3/4) Epoch 39, batch 550, loss[loss=0.011, audio_tagging_loss=0.011, over 24750.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4639348.10 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:47:43,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1211053.3333333333, ans=0.125 2023-12-23 15:47:47,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1211053.3333333333, ans=0.125 2023-12-23 15:47:55,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2023-12-23 15:48:03,897 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.249e+01 3.629e+01 3.792e+01 3.921e+01 4.570e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 15:48:12,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1211186.6666666667, ans=0.125 2023-12-23 15:48:15,047 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:48:18,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1211253.3333333333, ans=0.2 2023-12-23 15:48:18,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1211253.3333333333, ans=0.125 2023-12-23 15:48:35,003 INFO [train.py:886] (3/4) Epoch 39, batch 600, loss[loss=0.01335, audio_tagging_loss=0.01335, over 24750.00 frames. ], tot_loss[loss=0.01198, audio_tagging_loss=0.01198, over 4704497.90 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:48:43,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211386.6666666667, ans=0.1 2023-12-23 15:48:55,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1211520.0, ans=0.125 2023-12-23 15:48:56,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=22.5 2023-12-23 15:49:13,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-12-23 15:49:18,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1211653.3333333333, ans=0.1 2023-12-23 15:49:25,994 INFO [train.py:886] (3/4) Epoch 39, batch 650, loss[loss=0.01259, audio_tagging_loss=0.01259, over 22219.00 frames. ], tot_loss[loss=0.012, audio_tagging_loss=0.012, over 4754790.20 frames. ], batch size: 107, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:49:37,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1211786.6666666667, ans=0.0 2023-12-23 15:49:43,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1211786.6666666667, ans=0.125 2023-12-23 15:49:47,857 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.662e+01 3.873e+01 3.983e+01 4.612e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 15:49:48,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2023-12-23 15:49:49,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1211853.3333333333, ans=0.1 2023-12-23 15:49:54,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1211853.3333333333, ans=0.125 2023-12-23 15:49:54,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-12-23 15:49:57,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-12-23 15:50:08,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1211986.6666666667, ans=0.2 2023-12-23 15:50:12,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1211986.6666666667, ans=0.125 2023-12-23 15:50:19,291 INFO [train.py:886] (3/4) Epoch 39, batch 700, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4795137.57 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:50:28,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1212120.0, ans=0.125 2023-12-23 15:50:35,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1212120.0, ans=0.125 2023-12-23 15:50:55,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1212253.3333333333, ans=0.125 2023-12-23 15:51:01,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1212320.0, ans=0.1 2023-12-23 15:51:10,650 INFO [train.py:886] (3/4) Epoch 39, batch 750, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4828538.17 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:51:31,121 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.158e+01 3.606e+01 3.797e+01 3.983e+01 4.805e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 15:51:32,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1212520.0, ans=0.1 2023-12-23 15:52:02,568 INFO [train.py:886] (3/4) Epoch 39, batch 800, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4854354.54 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 32.0 2023-12-23 15:52:07,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1212720.0, ans=0.2 2023-12-23 15:52:10,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1212720.0, ans=0.125 2023-12-23 15:52:24,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1212853.3333333333, ans=0.125 2023-12-23 15:52:40,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1212920.0, ans=0.0 2023-12-23 15:52:45,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1212986.6666666667, ans=0.2 2023-12-23 15:52:53,717 INFO [train.py:886] (3/4) Epoch 39, batch 850, loss[loss=0.01287, audio_tagging_loss=0.01287, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4881728.60 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:53:01,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-12-23 15:53:05,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2023-12-23 15:53:14,221 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.592e+01 3.779e+01 3.968e+01 4.434e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 15:53:35,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1213320.0, ans=0.2 2023-12-23 15:53:37,163 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:53:37,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2023-12-23 15:53:45,639 INFO [train.py:886] (3/4) Epoch 39, batch 900, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4898883.10 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 64.0 2023-12-23 15:53:50,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1213386.6666666667, ans=0.125 2023-12-23 15:53:56,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-12-23 15:54:07,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2023-12-23 15:54:13,211 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 15:54:17,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1213586.6666666667, ans=0.0 2023-12-23 15:54:33,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1213653.3333333333, ans=0.125 2023-12-23 15:54:38,405 INFO [train.py:886] (3/4) Epoch 39, batch 950, loss[loss=0.01382, audio_tagging_loss=0.01382, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4904333.12 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:54:46,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1213720.0, ans=0.125 2023-12-23 15:54:47,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1213786.6666666667, ans=0.125 2023-12-23 15:54:52,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1213786.6666666667, ans=0.1 2023-12-23 15:54:55,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1213786.6666666667, ans=0.125 2023-12-23 15:54:57,964 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.605e+01 3.794e+01 3.970e+01 4.761e+01, threshold=7.588e+01, percent-clipped=0.0 2023-12-23 15:55:09,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1213920.0, ans=0.05 2023-12-23 15:55:16,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-12-23 15:55:24,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1213986.6666666667, ans=0.125 2023-12-23 15:55:29,694 INFO [train.py:886] (3/4) Epoch 39, batch 1000, loss[loss=0.0115, audio_tagging_loss=0.0115, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4907859.41 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:55:31,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-23 15:55:54,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1214186.6666666667, ans=0.125 2023-12-23 15:56:11,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2023-12-23 15:56:20,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1214386.6666666667, ans=0.125 2023-12-23 15:56:21,329 INFO [train.py:886] (3/4) Epoch 39, batch 1050, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4915382.72 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:56:23,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1214386.6666666667, ans=0.1 2023-12-23 15:56:29,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1214386.6666666667, ans=0.125 2023-12-23 15:56:32,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-12-23 15:56:42,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.325e+01 3.673e+01 3.795e+01 3.964e+01 4.762e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 15:56:47,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1214520.0, ans=0.0 2023-12-23 15:56:56,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1214586.6666666667, ans=0.125 2023-12-23 15:56:59,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1214586.6666666667, ans=0.2 2023-12-23 15:57:04,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1214653.3333333333, ans=0.125 2023-12-23 15:57:13,364 INFO [train.py:886] (3/4) Epoch 39, batch 1100, loss[loss=0.01372, audio_tagging_loss=0.01372, over 25000.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4926763.28 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:57:13,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1214720.0, ans=0.125 2023-12-23 15:57:30,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1214786.6666666667, ans=0.1 2023-12-23 15:57:33,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1214853.3333333333, ans=0.125 2023-12-23 15:57:44,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1214920.0, ans=0.0 2023-12-23 15:57:46,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-12-23 15:57:52,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2023-12-23 15:57:58,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1214986.6666666667, ans=0.07 2023-12-23 15:58:03,989 INFO [train.py:886] (3/4) Epoch 39, batch 1150, loss[loss=0.009429, audio_tagging_loss=0.009429, over 25000.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4933142.80 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:58:04,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1215053.3333333333, ans=0.125 2023-12-23 15:58:10,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1215053.3333333333, ans=0.1 2023-12-23 15:58:12,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-12-23 15:58:25,641 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 3.585e+01 3.703e+01 3.926e+01 4.731e+01, threshold=7.406e+01, percent-clipped=0.0 2023-12-23 15:58:35,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1215253.3333333333, ans=0.1 2023-12-23 15:58:40,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2023-12-23 15:58:56,851 INFO [train.py:886] (3/4) Epoch 39, batch 1200, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4938598.08 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:59:24,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1215520.0, ans=0.2 2023-12-23 15:59:47,988 INFO [train.py:886] (3/4) Epoch 39, batch 1250, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4944225.69 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 15:59:48,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1215720.0, ans=0.2 2023-12-23 15:59:53,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-12-23 15:59:55,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1215720.0, ans=0.125 2023-12-23 15:59:56,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1215720.0, ans=0.125 2023-12-23 15:59:58,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1215786.6666666667, ans=0.125 2023-12-23 16:00:08,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1215853.3333333333, ans=0.2 2023-12-23 16:00:08,925 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.341e+01 3.597e+01 3.795e+01 3.980e+01 4.718e+01, threshold=7.591e+01, percent-clipped=0.0 2023-12-23 16:00:10,207 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.612e-02 2023-12-23 16:00:11,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-23 16:00:21,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1215920.0, ans=0.125 2023-12-23 16:00:29,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1215986.6666666667, ans=0.125 2023-12-23 16:00:31,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1215986.6666666667, ans=0.2 2023-12-23 16:00:38,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1215986.6666666667, ans=0.0 2023-12-23 16:00:40,163 INFO [train.py:886] (3/4) Epoch 39, batch 1300, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 4944610.09 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:00:50,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1216120.0, ans=0.125 2023-12-23 16:00:52,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.02 vs. limit=12.0 2023-12-23 16:00:57,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1216120.0, ans=0.125 2023-12-23 16:00:59,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1216120.0, ans=0.125 2023-12-23 16:01:21,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-12-23 16:01:29,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 16:01:32,443 INFO [train.py:886] (3/4) Epoch 39, batch 1350, loss[loss=0.009988, audio_tagging_loss=0.009988, over 25000.00 frames. ], tot_loss[loss=0.01174, audio_tagging_loss=0.01174, over 4946694.03 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:01:38,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1216386.6666666667, ans=0.125 2023-12-23 16:01:42,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1216453.3333333333, ans=0.2 2023-12-23 16:01:52,793 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.226e+01 3.613e+01 3.759e+01 3.931e+01 4.440e+01, threshold=7.518e+01, percent-clipped=0.0 2023-12-23 16:01:53,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1216520.0, ans=0.125 2023-12-23 16:02:03,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1216586.6666666667, ans=0.0 2023-12-23 16:02:14,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-12-23 16:02:15,970 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:02:24,080 INFO [train.py:886] (3/4) Epoch 39, batch 1400, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4952398.38 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:02:41,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1216786.6666666667, ans=0.125 2023-12-23 16:02:56,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1216920.0, ans=0.125 2023-12-23 16:02:57,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1216920.0, ans=0.125 2023-12-23 16:03:00,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1216920.0, ans=0.1 2023-12-23 16:03:16,294 INFO [train.py:886] (3/4) Epoch 39, batch 1450, loss[loss=0.01269, audio_tagging_loss=0.01269, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4958691.53 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:03:24,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-12-23 16:03:34,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1217120.0, ans=0.0 2023-12-23 16:03:36,561 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.585e+01 3.717e+01 3.896e+01 4.835e+01, threshold=7.434e+01, percent-clipped=0.0 2023-12-23 16:03:47,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1217253.3333333333, ans=0.05 2023-12-23 16:03:54,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1217253.3333333333, ans=0.2 2023-12-23 16:04:06,454 INFO [train.py:886] (3/4) Epoch 39, batch 1500, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4960063.60 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:04:21,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-12-23 16:04:28,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-12-23 16:04:33,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1217520.0, ans=0.125 2023-12-23 16:04:49,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1217653.3333333333, ans=0.0 2023-12-23 16:04:54,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217653.3333333333, ans=0.1 2023-12-23 16:04:57,992 INFO [train.py:886] (3/4) Epoch 39, batch 1550, loss[loss=0.01499, audio_tagging_loss=0.01499, over 24959.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4951225.61 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:05:14,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1217786.6666666667, ans=0.07 2023-12-23 16:05:18,866 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.671e+01 3.823e+01 4.043e+01 4.664e+01, threshold=7.647e+01, percent-clipped=0.0 2023-12-23 16:05:22,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1217853.3333333333, ans=0.125 2023-12-23 16:05:43,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1217986.6666666667, ans=0.125 2023-12-23 16:05:47,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1217986.6666666667, ans=0.0 2023-12-23 16:05:49,419 INFO [train.py:886] (3/4) Epoch 39, batch 1600, loss[loss=0.01194, audio_tagging_loss=0.01194, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4943491.03 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:06:03,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-12-23 16:06:06,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1218120.0, ans=0.0 2023-12-23 16:06:09,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.77 vs. limit=22.5 2023-12-23 16:06:27,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1218253.3333333333, ans=0.5 2023-12-23 16:06:31,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1218320.0, ans=0.125 2023-12-23 16:06:40,795 INFO [train.py:886] (3/4) Epoch 39, batch 1650, loss[loss=0.008361, audio_tagging_loss=0.008361, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4944101.28 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:06:41,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1218386.6666666667, ans=0.0 2023-12-23 16:06:46,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1218386.6666666667, ans=0.04949747468305833 2023-12-23 16:06:51,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1218453.3333333333, ans=0.0 2023-12-23 16:06:53,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1218453.3333333333, ans=0.125 2023-12-23 16:06:54,902 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:07:00,999 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.195e+01 3.628e+01 3.774e+01 3.923e+01 5.343e+01, threshold=7.548e+01, percent-clipped=0.0 2023-12-23 16:07:04,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1218520.0, ans=0.125 2023-12-23 16:07:25,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-12-23 16:07:31,236 INFO [train.py:886] (3/4) Epoch 39, batch 1700, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4950067.49 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:07:42,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1218786.6666666667, ans=0.0 2023-12-23 16:08:01,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1218920.0, ans=0.125 2023-12-23 16:08:22,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1219053.3333333333, ans=0.125 2023-12-23 16:08:23,647 INFO [train.py:886] (3/4) Epoch 39, batch 1750, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4947821.75 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:08:40,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1219120.0, ans=0.125 2023-12-23 16:08:43,369 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.314e+01 3.573e+01 3.705e+01 3.928e+01 4.407e+01, threshold=7.410e+01, percent-clipped=0.0 2023-12-23 16:08:47,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219186.6666666667, ans=0.1 2023-12-23 16:09:01,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1219253.3333333333, ans=0.125 2023-12-23 16:09:05,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1219320.0, ans=0.125 2023-12-23 16:09:10,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1219320.0, ans=0.0 2023-12-23 16:09:13,952 INFO [train.py:886] (3/4) Epoch 39, batch 1800, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4951094.57 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:09:34,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1219520.0, ans=0.1 2023-12-23 16:09:35,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1219520.0, ans=0.07 2023-12-23 16:09:41,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1219520.0, ans=0.125 2023-12-23 16:09:54,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1219653.3333333333, ans=0.125 2023-12-23 16:10:05,618 INFO [train.py:886] (3/4) Epoch 39, batch 1850, loss[loss=0.01323, audio_tagging_loss=0.01323, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4947335.88 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:10:25,962 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.296e+01 3.679e+01 3.832e+01 4.036e+01 5.101e+01, threshold=7.663e+01, percent-clipped=0.0 2023-12-23 16:10:41,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1219920.0, ans=0.125 2023-12-23 16:10:43,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1219920.0, ans=0.0 2023-12-23 16:10:46,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-12-23 16:10:57,216 INFO [train.py:886] (3/4) Epoch 39, batch 1900, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4946541.66 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:11:06,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1220120.0, ans=0.125 2023-12-23 16:11:09,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1220120.0, ans=0.125 2023-12-23 16:11:37,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1220320.0, ans=0.1 2023-12-23 16:11:47,543 INFO [train.py:886] (3/4) Epoch 39, batch 1950, loss[loss=0.009531, audio_tagging_loss=0.009531, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4942615.71 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:12:07,897 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.088e+01 3.588e+01 3.763e+01 3.930e+01 4.449e+01, threshold=7.526e+01, percent-clipped=0.0 2023-12-23 16:12:12,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1220520.0, ans=0.0 2023-12-23 16:12:24,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-12-23 16:12:33,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1220653.3333333333, ans=15.0 2023-12-23 16:12:35,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1220653.3333333333, ans=0.2 2023-12-23 16:12:36,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1220653.3333333333, ans=0.2 2023-12-23 16:12:36,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1220653.3333333333, ans=0.2 2023-12-23 16:12:38,737 INFO [train.py:886] (3/4) Epoch 39, batch 2000, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4943814.87 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:12:44,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1220720.0, ans=0.2 2023-12-23 16:12:51,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-23 16:12:53,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1220786.6666666667, ans=0.125 2023-12-23 16:13:29,346 INFO [train.py:886] (3/4) Epoch 39, batch 2050, loss[loss=0.01438, audio_tagging_loss=0.01438, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4946207.39 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:13:51,332 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.148e+01 3.589e+01 3.729e+01 3.908e+01 4.611e+01, threshold=7.458e+01, percent-clipped=0.0 2023-12-23 16:13:53,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1221186.6666666667, ans=0.125 2023-12-23 16:14:18,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1221320.0, ans=0.125 2023-12-23 16:14:21,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2023-12-23 16:14:23,173 INFO [train.py:886] (3/4) Epoch 39, batch 2100, loss[loss=0.01069, audio_tagging_loss=0.01069, over 22068.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4945101.87 frames. ], batch size: 107, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:14:41,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1221453.3333333333, ans=0.125 2023-12-23 16:15:07,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1221653.3333333333, ans=0.0 2023-12-23 16:15:14,107 INFO [train.py:886] (3/4) Epoch 39, batch 2150, loss[loss=0.0122, audio_tagging_loss=0.0122, over 24945.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4954532.32 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:15:19,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-12-23 16:15:19,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1221720.0, ans=0.0 2023-12-23 16:15:24,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1221786.6666666667, ans=0.125 2023-12-23 16:15:26,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1221786.6666666667, ans=0.05 2023-12-23 16:15:33,842 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.644e+01 3.753e+01 3.947e+01 5.073e+01, threshold=7.506e+01, percent-clipped=0.0 2023-12-23 16:16:04,607 INFO [train.py:886] (3/4) Epoch 39, batch 2200, loss[loss=0.01503, audio_tagging_loss=0.01503, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4945768.32 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:16:32,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1222186.6666666667, ans=0.125 2023-12-23 16:16:37,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1222253.3333333333, ans=0.0 2023-12-23 16:16:39,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1222253.3333333333, ans=0.0 2023-12-23 16:16:56,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1222386.6666666667, ans=0.125 2023-12-23 16:16:57,383 INFO [train.py:886] (3/4) Epoch 39, batch 2250, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4946230.06 frames. ], batch size: 99, lr: 2.76e-03, grad_scale: 64.0 2023-12-23 16:17:17,743 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.644e+01 3.804e+01 3.965e+01 5.338e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 16:17:30,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1222586.6666666667, ans=0.1 2023-12-23 16:17:38,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1222653.3333333333, ans=0.0 2023-12-23 16:17:39,646 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:17:40,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1222653.3333333333, ans=0.0 2023-12-23 16:17:43,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1222653.3333333333, ans=0.125 2023-12-23 16:17:43,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1222653.3333333333, ans=0.125 2023-12-23 16:17:47,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1222653.3333333333, ans=0.0 2023-12-23 16:17:49,023 INFO [train.py:886] (3/4) Epoch 39, batch 2300, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4949629.74 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:18:15,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1222853.3333333333, ans=0.2 2023-12-23 16:18:21,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1222920.0, ans=0.125 2023-12-23 16:18:25,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1222920.0, ans=0.125 2023-12-23 16:18:41,200 INFO [train.py:886] (3/4) Epoch 39, batch 2350, loss[loss=0.008957, audio_tagging_loss=0.008957, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4948921.27 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:18:43,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1223053.3333333333, ans=0.0 2023-12-23 16:19:02,463 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.209e+01 3.576e+01 3.750e+01 3.912e+01 4.537e+01, threshold=7.499e+01, percent-clipped=0.0 2023-12-23 16:19:11,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1223186.6666666667, ans=0.125 2023-12-23 16:19:16,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1223253.3333333333, ans=0.2 2023-12-23 16:19:32,837 INFO [train.py:886] (3/4) Epoch 39, batch 2400, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4954165.16 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:19:33,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1223386.6666666667, ans=0.125 2023-12-23 16:19:43,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-12-23 16:19:52,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1223520.0, ans=0.2 2023-12-23 16:19:56,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1223520.0, ans=0.125 2023-12-23 16:19:57,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1223520.0, ans=0.2 2023-12-23 16:19:57,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1223520.0, ans=0.2 2023-12-23 16:20:06,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1223586.6666666667, ans=0.125 2023-12-23 16:20:07,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1223586.6666666667, ans=0.125 2023-12-23 16:20:11,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-12-23 16:20:12,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1223586.6666666667, ans=0.0 2023-12-23 16:20:24,863 INFO [train.py:886] (3/4) Epoch 39, batch 2450, loss[loss=0.01346, audio_tagging_loss=0.01346, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4957741.97 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:20:45,903 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.674e+01 3.800e+01 3.952e+01 4.172e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 16:20:51,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-12-23 16:20:56,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1223920.0, ans=0.0 2023-12-23 16:21:07,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1223986.6666666667, ans=0.125 2023-12-23 16:21:13,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1223986.6666666667, ans=0.125 2023-12-23 16:21:17,301 INFO [train.py:886] (3/4) Epoch 39, batch 2500, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4955762.57 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:21:18,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1224053.3333333333, ans=0.125 2023-12-23 16:21:41,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1224186.6666666667, ans=0.125 2023-12-23 16:21:44,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-12-23 16:22:06,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1224320.0, ans=0.0 2023-12-23 16:22:09,651 INFO [train.py:886] (3/4) Epoch 39, batch 2550, loss[loss=0.01058, audio_tagging_loss=0.01058, over 24750.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4948060.03 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:22:21,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-12-23 16:22:29,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1224520.0, ans=0.0 2023-12-23 16:22:30,076 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.640e+01 3.880e+01 4.066e+01 5.144e+01, threshold=7.760e+01, percent-clipped=0.0 2023-12-23 16:22:37,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1224520.0, ans=0.035 2023-12-23 16:22:41,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1224586.6666666667, ans=0.125 2023-12-23 16:23:00,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1224720.0, ans=0.125 2023-12-23 16:23:01,523 INFO [train.py:886] (3/4) Epoch 39, batch 2600, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4946437.97 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:11,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1224786.6666666667, ans=0.125 2023-12-23 16:23:34,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-12-23 16:23:54,152 INFO [train.py:886] (3/4) Epoch 39, batch 2650, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4951370.11 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:23:58,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1225053.3333333333, ans=0.125 2023-12-23 16:24:04,851 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:24:10,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1225120.0, ans=0.05 2023-12-23 16:24:14,557 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.356e+01 3.623e+01 3.755e+01 3.962e+01 4.665e+01, threshold=7.509e+01, percent-clipped=0.0 2023-12-23 16:24:46,261 INFO [train.py:886] (3/4) Epoch 39, batch 2700, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4953628.14 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:24:54,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1225386.6666666667, ans=0.2 2023-12-23 16:24:57,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1225453.3333333333, ans=0.1 2023-12-23 16:24:57,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1225453.3333333333, ans=0.1 2023-12-23 16:24:59,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1225453.3333333333, ans=0.1 2023-12-23 16:25:19,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1225586.6666666667, ans=0.0 2023-12-23 16:25:24,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-12-23 16:25:36,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1225653.3333333333, ans=0.125 2023-12-23 16:25:36,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1225653.3333333333, ans=0.0 2023-12-23 16:25:37,942 INFO [train.py:886] (3/4) Epoch 39, batch 2750, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4956407.94 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:25:42,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-12-23 16:25:42,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1225720.0, ans=0.0 2023-12-23 16:25:50,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1225786.6666666667, ans=0.2 2023-12-23 16:25:57,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-12-23 16:25:58,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1225853.3333333333, ans=0.05 2023-12-23 16:25:59,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.579e+01 3.766e+01 3.936e+01 4.564e+01, threshold=7.531e+01, percent-clipped=0.0 2023-12-23 16:26:00,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1225853.3333333333, ans=0.125 2023-12-23 16:26:03,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1225853.3333333333, ans=0.125 2023-12-23 16:26:05,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1225853.3333333333, ans=0.2 2023-12-23 16:26:25,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-12-23 16:26:30,267 INFO [train.py:886] (3/4) Epoch 39, batch 2800, loss[loss=0.00998, audio_tagging_loss=0.00998, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4954976.77 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:26:36,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1226053.3333333333, ans=0.07 2023-12-23 16:26:46,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1226120.0, ans=0.0 2023-12-23 16:26:51,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1226186.6666666667, ans=0.04949747468305833 2023-12-23 16:26:55,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1226186.6666666667, ans=0.2 2023-12-23 16:27:08,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1226253.3333333333, ans=0.1 2023-12-23 16:27:19,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1226320.0, ans=0.125 2023-12-23 16:27:20,918 INFO [train.py:886] (3/4) Epoch 39, batch 2850, loss[loss=0.01222, audio_tagging_loss=0.01222, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4953481.50 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:27:22,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1226386.6666666667, ans=0.125 2023-12-23 16:27:38,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1226453.3333333333, ans=0.125 2023-12-23 16:27:43,714 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.655e+01 3.774e+01 3.936e+01 6.681e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 16:28:03,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1226653.3333333333, ans=0.0 2023-12-23 16:28:10,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1226653.3333333333, ans=0.125 2023-12-23 16:28:12,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1226653.3333333333, ans=0.2 2023-12-23 16:28:16,149 INFO [train.py:886] (3/4) Epoch 39, batch 2900, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4945392.69 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:28:22,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1226720.0, ans=0.0 2023-12-23 16:28:24,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1226786.6666666667, ans=0.2 2023-12-23 16:28:27,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1226786.6666666667, ans=10.0 2023-12-23 16:28:36,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1226853.3333333333, ans=0.0 2023-12-23 16:28:43,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1226853.3333333333, ans=0.125 2023-12-23 16:28:45,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1226853.3333333333, ans=0.2 2023-12-23 16:28:50,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1226920.0, ans=10.0 2023-12-23 16:29:08,284 INFO [train.py:886] (3/4) Epoch 39, batch 2950, loss[loss=0.01301, audio_tagging_loss=0.01301, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4949380.80 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:23,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1227120.0, ans=0.125 2023-12-23 16:29:28,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1227186.6666666667, ans=0.0 2023-12-23 16:29:28,876 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.270e+01 3.645e+01 3.777e+01 3.932e+01 4.663e+01, threshold=7.553e+01, percent-clipped=0.0 2023-12-23 16:29:44,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1227253.3333333333, ans=0.125 2023-12-23 16:29:51,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1227320.0, ans=0.0 2023-12-23 16:29:53,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1227320.0, ans=0.125 2023-12-23 16:29:53,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1227320.0, ans=0.125 2023-12-23 16:29:54,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1227320.0, ans=0.1 2023-12-23 16:29:58,849 INFO [train.py:886] (3/4) Epoch 39, batch 3000, loss[loss=0.008757, audio_tagging_loss=0.008757, over 21947.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4944844.34 frames. ], batch size: 107, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:29:58,850 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 16:30:19,976 INFO [train.py:917] (3/4) Epoch 39, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 16:30:19,976 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 16:30:31,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1227453.3333333333, ans=0.1 2023-12-23 16:30:32,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-12-23 16:30:44,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-12-23 16:31:02,815 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:31:10,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1227720.0, ans=0.1 2023-12-23 16:31:10,892 INFO [train.py:886] (3/4) Epoch 39, batch 3050, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4950919.94 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:31:18,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1227720.0, ans=0.125 2023-12-23 16:31:26,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1227786.6666666667, ans=0.1 2023-12-23 16:31:27,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1227786.6666666667, ans=0.2 2023-12-23 16:31:32,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.327e+01 3.613e+01 3.797e+01 3.917e+01 4.495e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 16:31:51,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1227986.6666666667, ans=0.125 2023-12-23 16:31:52,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=15.0 2023-12-23 16:31:57,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1227986.6666666667, ans=0.2 2023-12-23 16:31:59,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1227986.6666666667, ans=0.0 2023-12-23 16:32:00,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1227986.6666666667, ans=0.1 2023-12-23 16:32:03,069 INFO [train.py:886] (3/4) Epoch 39, batch 3100, loss[loss=0.01251, audio_tagging_loss=0.01251, over 24750.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4951272.35 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:32:04,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2023-12-23 16:32:14,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1228120.0, ans=0.125 2023-12-23 16:32:19,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.31 vs. limit=12.0 2023-12-23 16:32:22,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1228120.0, ans=0.0 2023-12-23 16:32:37,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-23 16:32:55,426 INFO [train.py:886] (3/4) Epoch 39, batch 3150, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4946599.86 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:16,707 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.715e+01 3.835e+01 3.978e+01 4.506e+01, threshold=7.670e+01, percent-clipped=0.0 2023-12-23 16:33:37,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2023-12-23 16:33:44,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-12-23 16:33:46,404 INFO [train.py:886] (3/4) Epoch 39, batch 3200, loss[loss=0.01452, audio_tagging_loss=0.01452, over 22366.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4943737.36 frames. ], batch size: 107, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:33:52,835 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:34:01,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2023-12-23 16:34:06,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1228786.6666666667, ans=0.1 2023-12-23 16:34:16,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1228920.0, ans=0.125 2023-12-23 16:34:21,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1228920.0, ans=0.125 2023-12-23 16:34:35,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-23 16:34:39,437 INFO [train.py:886] (3/4) Epoch 39, batch 3250, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4947667.22 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:34:59,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1229186.6666666667, ans=0.125 2023-12-23 16:35:00,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.139e+01 3.582e+01 3.732e+01 3.928e+01 4.508e+01, threshold=7.464e+01, percent-clipped=0.0 2023-12-23 16:35:07,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-12-23 16:35:14,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1229253.3333333333, ans=0.0 2023-12-23 16:35:29,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1229320.0, ans=0.1 2023-12-23 16:35:31,254 INFO [train.py:886] (3/4) Epoch 39, batch 3300, loss[loss=0.01162, audio_tagging_loss=0.01162, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4942974.51 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:35:35,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1229386.6666666667, ans=0.125 2023-12-23 16:35:46,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1229453.3333333333, ans=0.025 2023-12-23 16:35:47,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1229453.3333333333, ans=0.125 2023-12-23 16:35:55,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1229520.0, ans=0.07 2023-12-23 16:36:04,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2023-12-23 16:36:17,842 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:36:20,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1229653.3333333333, ans=0.2 2023-12-23 16:36:22,423 INFO [train.py:886] (3/4) Epoch 39, batch 3350, loss[loss=0.01138, audio_tagging_loss=0.01138, over 25000.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4949243.08 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:36:23,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-12-23 16:36:31,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1229786.6666666667, ans=0.1 2023-12-23 16:36:45,134 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.391e+01 3.649e+01 3.789e+01 3.930e+01 4.813e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:36:46,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1229853.3333333333, ans=0.125 2023-12-23 16:36:47,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1229853.3333333333, ans=0.125 2023-12-23 16:37:13,996 INFO [train.py:886] (3/4) Epoch 39, batch 3400, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4955123.50 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:37:16,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1230053.3333333333, ans=10.0 2023-12-23 16:37:21,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-23 16:37:25,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2023-12-23 16:37:30,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.56 vs. limit=10.0 2023-12-23 16:37:35,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2023-12-23 16:37:45,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1230253.3333333333, ans=0.1 2023-12-23 16:37:49,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1230253.3333333333, ans=0.0 2023-12-23 16:38:02,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1230320.0, ans=0.125 2023-12-23 16:38:06,185 INFO [train.py:886] (3/4) Epoch 39, batch 3450, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4946890.40 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:38:28,015 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.698e+01 3.845e+01 3.983e+01 4.520e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 16:38:39,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1230586.6666666667, ans=0.02 2023-12-23 16:38:41,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1230586.6666666667, ans=0.125 2023-12-23 16:38:58,261 INFO [train.py:886] (3/4) Epoch 39, batch 3500, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.01164, audio_tagging_loss=0.01164, over 4938338.79 frames. ], batch size: 99, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:38:59,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1230720.0, ans=0.125 2023-12-23 16:39:04,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1230720.0, ans=0.125 2023-12-23 16:39:05,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2023-12-23 16:39:12,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1230786.6666666667, ans=0.125 2023-12-23 16:39:14,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1230786.6666666667, ans=0.07 2023-12-23 16:39:23,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-12-23 16:39:49,920 INFO [train.py:886] (3/4) Epoch 39, batch 3550, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4937474.36 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:39:51,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1231053.3333333333, ans=0.125 2023-12-23 16:40:05,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1231120.0, ans=0.0 2023-12-23 16:40:11,392 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.150e+01 3.585e+01 3.771e+01 3.949e+01 4.246e+01, threshold=7.542e+01, percent-clipped=0.0 2023-12-23 16:40:19,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-12-23 16:40:35,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1231320.0, ans=0.0 2023-12-23 16:40:40,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1231386.6666666667, ans=0.2 2023-12-23 16:40:40,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1231386.6666666667, ans=0.125 2023-12-23 16:40:41,476 INFO [train.py:886] (3/4) Epoch 39, batch 3600, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4943180.74 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 64.0 2023-12-23 16:40:56,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1231453.3333333333, ans=0.0 2023-12-23 16:41:01,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1231520.0, ans=0.1 2023-12-23 16:41:13,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1231586.6666666667, ans=0.1 2023-12-23 16:41:34,302 INFO [train.py:886] (3/4) Epoch 39, batch 3650, loss[loss=0.01173, audio_tagging_loss=0.01173, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4947200.60 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:41:54,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-12-23 16:41:56,219 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.628e+01 3.795e+01 4.011e+01 5.130e+01, threshold=7.590e+01, percent-clipped=0.0 2023-12-23 16:42:01,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-12-23 16:42:07,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1231920.0, ans=0.2 2023-12-23 16:42:08,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-12-23 16:42:13,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1231920.0, ans=0.125 2023-12-23 16:42:26,726 INFO [train.py:886] (3/4) Epoch 39, batch 3700, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4955444.84 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:42:38,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1232120.0, ans=0.125 2023-12-23 16:42:39,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2023-12-23 16:42:44,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1232120.0, ans=0.125 2023-12-23 16:42:50,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1232186.6666666667, ans=0.95 2023-12-23 16:43:11,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1232320.0, ans=0.2 2023-12-23 16:43:17,085 INFO [train.py:886] (3/4) Epoch 39, batch 3750, loss[loss=0.01124, audio_tagging_loss=0.01124, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4955899.40 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:43:31,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-12-23 16:43:39,547 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.207e+01 3.636e+01 3.779e+01 3.931e+01 4.643e+01, threshold=7.558e+01, percent-clipped=0.0 2023-12-23 16:44:10,049 INFO [train.py:886] (3/4) Epoch 39, batch 3800, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4951008.54 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 64.0 2023-12-23 16:44:14,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1232720.0, ans=0.125 2023-12-23 16:44:17,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2023-12-23 16:44:21,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1232786.6666666667, ans=0.125 2023-12-23 16:44:27,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1232786.6666666667, ans=0.1 2023-12-23 16:44:28,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1232786.6666666667, ans=0.125 2023-12-23 16:44:28,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1232786.6666666667, ans=15.0 2023-12-23 16:44:29,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1232853.3333333333, ans=0.1 2023-12-23 16:44:53,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-12-23 16:44:58,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2023-12-23 16:45:01,448 INFO [train.py:886] (3/4) Epoch 39, batch 3850, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4952751.20 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:45:02,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1233053.3333333333, ans=0.95 2023-12-23 16:45:09,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1233053.3333333333, ans=0.0 2023-12-23 16:45:19,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1233120.0, ans=0.05 2023-12-23 16:45:21,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1233186.6666666667, ans=0.0 2023-12-23 16:45:23,678 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.220e+01 3.590e+01 3.789e+01 3.957e+01 4.562e+01, threshold=7.578e+01, percent-clipped=0.0 2023-12-23 16:45:26,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1233186.6666666667, ans=0.0 2023-12-23 16:45:31,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1233253.3333333333, ans=0.125 2023-12-23 16:45:48,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1233320.0, ans=0.1 2023-12-23 16:45:53,260 INFO [train.py:886] (3/4) Epoch 39, batch 3900, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4954166.91 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:45:54,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1233386.6666666667, ans=0.1 2023-12-23 16:46:12,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1233520.0, ans=0.0 2023-12-23 16:46:43,915 INFO [train.py:886] (3/4) Epoch 39, batch 3950, loss[loss=0.01438, audio_tagging_loss=0.01438, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4953624.47 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:46:54,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1233786.6666666667, ans=0.2 2023-12-23 16:47:05,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1233853.3333333333, ans=0.125 2023-12-23 16:47:05,591 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.227e+01 3.602e+01 3.745e+01 4.012e+01 4.573e+01, threshold=7.490e+01, percent-clipped=0.0 2023-12-23 16:47:11,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2023-12-23 16:47:34,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1234053.3333333333, ans=0.0 2023-12-23 16:47:34,894 INFO [train.py:886] (3/4) Epoch 39, batch 4000, loss[loss=0.0123, audio_tagging_loss=0.0123, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4957252.59 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:47:43,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=15.0 2023-12-23 16:47:48,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1234120.0, ans=0.1 2023-12-23 16:47:56,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1234186.6666666667, ans=0.125 2023-12-23 16:48:06,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1234253.3333333333, ans=0.0 2023-12-23 16:48:19,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.52 vs. limit=22.5 2023-12-23 16:48:27,940 INFO [train.py:886] (3/4) Epoch 39, batch 4050, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4952938.74 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:48:50,245 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.639e+01 3.792e+01 4.036e+01 4.478e+01, threshold=7.585e+01, percent-clipped=0.0 2023-12-23 16:48:57,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1234586.6666666667, ans=0.125 2023-12-23 16:49:00,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-23 16:49:06,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2023-12-23 16:49:13,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1234653.3333333333, ans=0.125 2023-12-23 16:49:13,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1234653.3333333333, ans=0.125 2023-12-23 16:49:17,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1234720.0, ans=0.125 2023-12-23 16:49:18,333 INFO [train.py:886] (3/4) Epoch 39, batch 4100, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4942334.17 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:49:29,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1234786.6666666667, ans=0.05 2023-12-23 16:49:29,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1234786.6666666667, ans=0.2 2023-12-23 16:49:39,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1234853.3333333333, ans=0.125 2023-12-23 16:49:41,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1234853.3333333333, ans=0.0 2023-12-23 16:50:03,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1234986.6666666667, ans=0.0 2023-12-23 16:50:05,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1234986.6666666667, ans=0.0 2023-12-23 16:50:07,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1234986.6666666667, ans=0.0 2023-12-23 16:50:10,278 INFO [train.py:886] (3/4) Epoch 39, batch 4150, loss[loss=0.00756, audio_tagging_loss=0.00756, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4941843.99 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:50:20,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-12-23 16:50:32,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1235186.6666666667, ans=0.125 2023-12-23 16:50:33,716 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.167e+01 3.682e+01 3.809e+01 3.972e+01 4.566e+01, threshold=7.618e+01, percent-clipped=0.0 2023-12-23 16:50:37,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1235186.6666666667, ans=0.125 2023-12-23 16:50:38,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1235186.6666666667, ans=0.125 2023-12-23 16:51:01,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1235386.6666666667, ans=0.125 2023-12-23 16:51:02,390 INFO [train.py:886] (3/4) Epoch 39, batch 4200, loss[loss=0.01417, audio_tagging_loss=0.01417, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4946856.70 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:51:08,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1235386.6666666667, ans=0.125 2023-12-23 16:51:08,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-12-23 16:51:09,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1235386.6666666667, ans=0.125 2023-12-23 16:51:19,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1235453.3333333333, ans=0.0 2023-12-23 16:51:41,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1235586.6666666667, ans=0.125 2023-12-23 16:51:49,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1235653.3333333333, ans=0.125 2023-12-23 16:51:54,128 INFO [train.py:886] (3/4) Epoch 39, batch 4250, loss[loss=0.009595, audio_tagging_loss=0.009595, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4952393.75 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:52:03,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1235786.6666666667, ans=0.0 2023-12-23 16:52:13,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1235786.6666666667, ans=0.0 2023-12-23 16:52:13,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1235786.6666666667, ans=0.0 2023-12-23 16:52:14,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1235853.3333333333, ans=0.125 2023-12-23 16:52:17,121 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.199e+01 3.604e+01 3.815e+01 3.941e+01 4.499e+01, threshold=7.630e+01, percent-clipped=0.0 2023-12-23 16:52:28,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1235920.0, ans=0.1 2023-12-23 16:52:32,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1235920.0, ans=0.125 2023-12-23 16:52:46,811 INFO [train.py:886] (3/4) Epoch 39, batch 4300, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4956940.44 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:53:00,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1236120.0, ans=0.125 2023-12-23 16:53:09,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1236186.6666666667, ans=0.0 2023-12-23 16:53:10,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1236186.6666666667, ans=0.0 2023-12-23 16:53:18,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-12-23 16:53:22,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1236253.3333333333, ans=0.07 2023-12-23 16:53:34,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1236320.0, ans=0.125 2023-12-23 16:53:37,802 INFO [train.py:886] (3/4) Epoch 39, batch 4350, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4955305.73 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:53:42,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1236386.6666666667, ans=0.0 2023-12-23 16:54:00,710 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.613e+01 3.861e+01 4.059e+01 4.961e+01, threshold=7.722e+01, percent-clipped=0.0 2023-12-23 16:54:12,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1236586.6666666667, ans=0.05 2023-12-23 16:54:14,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1236586.6666666667, ans=10.0 2023-12-23 16:54:29,096 INFO [train.py:886] (3/4) Epoch 39, batch 4400, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4952654.94 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:54:47,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1236786.6666666667, ans=0.125 2023-12-23 16:54:50,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1236853.3333333333, ans=0.1 2023-12-23 16:54:55,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-12-23 16:55:13,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1236986.6666666667, ans=0.125 2023-12-23 16:55:19,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-12-23 16:55:20,779 INFO [train.py:886] (3/4) Epoch 39, batch 4450, loss[loss=0.01068, audio_tagging_loss=0.01068, over 24750.00 frames. ], tot_loss[loss=0.01166, audio_tagging_loss=0.01166, over 4939341.39 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:55:35,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1237120.0, ans=0.0 2023-12-23 16:55:44,357 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 3.658e+01 3.824e+01 3.990e+01 4.644e+01, threshold=7.648e+01, percent-clipped=0.0 2023-12-23 16:55:46,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=12.0 2023-12-23 16:55:58,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-12-23 16:56:00,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1237253.3333333333, ans=0.0 2023-12-23 16:56:13,236 INFO [train.py:886] (3/4) Epoch 39, batch 4500, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4942354.26 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:56:25,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1237453.3333333333, ans=0.125 2023-12-23 16:56:38,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1237520.0, ans=0.0 2023-12-23 16:56:55,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1237653.3333333333, ans=0.1 2023-12-23 16:57:05,492 INFO [train.py:886] (3/4) Epoch 39, batch 4550, loss[loss=0.01032, audio_tagging_loss=0.01032, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4950556.86 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:57:05,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1237720.0, ans=0.125 2023-12-23 16:57:14,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1237720.0, ans=0.125 2023-12-23 16:57:22,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1237786.6666666667, ans=0.2 2023-12-23 16:57:27,607 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.606e+01 3.763e+01 4.003e+01 4.650e+01, threshold=7.525e+01, percent-clipped=0.0 2023-12-23 16:57:56,798 INFO [train.py:886] (3/4) Epoch 39, batch 4600, loss[loss=0.01265, audio_tagging_loss=0.01265, over 25000.00 frames. ], tot_loss[loss=0.01153, audio_tagging_loss=0.01153, over 4949410.89 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:58:02,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1238053.3333333333, ans=10.0 2023-12-23 16:58:16,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1238186.6666666667, ans=0.05 2023-12-23 16:58:23,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1238186.6666666667, ans=0.125 2023-12-23 16:58:34,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1238253.3333333333, ans=0.1 2023-12-23 16:58:38,853 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 16:58:48,784 INFO [train.py:886] (3/4) Epoch 39, batch 4650, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4953864.03 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:58:52,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-23 16:59:01,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1238453.3333333333, ans=0.125 2023-12-23 16:59:11,643 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.683e+01 3.896e+01 4.120e+01 5.056e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 16:59:15,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1238520.0, ans=0.125 2023-12-23 16:59:25,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2023-12-23 16:59:40,150 INFO [train.py:886] (3/4) Epoch 39, batch 4700, loss[loss=0.01296, audio_tagging_loss=0.01296, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4955750.96 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 16:59:40,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1238720.0, ans=0.125 2023-12-23 16:59:48,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-12-23 17:00:03,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1238853.3333333333, ans=0.2 2023-12-23 17:00:05,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1238853.3333333333, ans=0.1 2023-12-23 17:00:09,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1238920.0, ans=0.125 2023-12-23 17:00:10,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2023-12-23 17:00:26,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1239053.3333333333, ans=0.0 2023-12-23 17:00:27,132 INFO [train.py:886] (3/4) Epoch 39, batch 4750, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4949181.90 frames. ], batch size: 99, lr: 2.74e-03, grad_scale: 32.0 2023-12-23 17:00:38,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1239120.0, ans=0.125 2023-12-23 17:01:01,907 INFO [train.py:886] (3/4) Epoch 40, batch 0, loss[loss=0.03023, audio_tagging_loss=0.03023, over 20402.00 frames. ], tot_loss[loss=0.03023, audio_tagging_loss=0.03023, over 20402.00 frames. ], batch size: 107, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:01:01,907 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 17:01:23,295 INFO [train.py:917] (3/4) Epoch 40, validation: loss=0.03439, audio_tagging_loss=0.03439, over 3737520.00 frames. 2023-12-23 17:01:23,296 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 17:01:25,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1239160.0, ans=0.015 2023-12-23 17:01:28,895 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.718e+01 3.892e+01 4.077e+01 1.138e+02, threshold=7.784e+01, percent-clipped=4.0 2023-12-23 17:01:34,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-12-23 17:01:38,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2023-12-23 17:01:39,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=12.0 2023-12-23 17:01:44,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1239293.3333333333, ans=0.0 2023-12-23 17:01:53,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1239360.0, ans=0.125 2023-12-23 17:02:14,202 INFO [train.py:886] (3/4) Epoch 40, batch 50, loss[loss=0.01772, audio_tagging_loss=0.01772, over 25000.00 frames. ], tot_loss[loss=0.01835, audio_tagging_loss=0.01835, over 1119117.19 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:02:25,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239560.0, ans=0.1 2023-12-23 17:02:38,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-12-23 17:02:41,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-23 17:03:06,243 INFO [train.py:886] (3/4) Epoch 40, batch 100, loss[loss=0.01435, audio_tagging_loss=0.01435, over 25000.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 1972287.75 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:03:11,823 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.790e+01 4.300e+01 4.589e+01 5.007e+01 8.087e+01, threshold=9.178e+01, percent-clipped=4.0 2023-12-23 17:03:12,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-12-23 17:03:14,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1239826.6666666667, ans=0.0 2023-12-23 17:03:27,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1239960.0, ans=0.125 2023-12-23 17:03:34,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2023-12-23 17:03:45,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1240093.3333333333, ans=0.0 2023-12-23 17:03:48,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240093.3333333333, ans=0.1 2023-12-23 17:03:51,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1240093.3333333333, ans=0.125 2023-12-23 17:03:54,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1240093.3333333333, ans=0.2 2023-12-23 17:03:56,506 INFO [train.py:886] (3/4) Epoch 40, batch 150, loss[loss=0.01471, audio_tagging_loss=0.01471, over 25000.00 frames. ], tot_loss[loss=0.01457, audio_tagging_loss=0.01457, over 2631691.26 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:03:56,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1240160.0, ans=0.2 2023-12-23 17:03:56,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1240160.0, ans=0.125 2023-12-23 17:04:06,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1240226.6666666667, ans=0.0 2023-12-23 17:04:23,882 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:04:28,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1240360.0, ans=0.125 2023-12-23 17:04:41,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.88 vs. limit=10.0 2023-12-23 17:04:48,683 INFO [train.py:886] (3/4) Epoch 40, batch 200, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01374, audio_tagging_loss=0.01374, over 3149752.36 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:04:54,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1240493.3333333333, ans=0.04949747468305833 2023-12-23 17:04:55,105 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.318e+01 3.719e+01 3.873e+01 4.042e+01 6.291e+01, threshold=7.746e+01, percent-clipped=0.0 2023-12-23 17:04:58,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1240560.0, ans=0.09899494936611666 2023-12-23 17:05:10,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1240626.6666666667, ans=0.125 2023-12-23 17:05:13,939 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:05:14,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1240626.6666666667, ans=0.125 2023-12-23 17:05:21,227 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:05:39,534 INFO [train.py:886] (3/4) Epoch 40, batch 250, loss[loss=0.01232, audio_tagging_loss=0.01232, over 25000.00 frames. ], tot_loss[loss=0.01328, audio_tagging_loss=0.01328, over 3553255.32 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:05:42,398 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:05:45,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1240826.6666666667, ans=0.0 2023-12-23 17:05:58,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1240893.3333333333, ans=0.025 2023-12-23 17:06:10,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-12-23 17:06:19,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1241026.6666666667, ans=0.0 2023-12-23 17:06:32,181 INFO [train.py:886] (3/4) Epoch 40, batch 300, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01298, audio_tagging_loss=0.01298, over 3860456.32 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:06:36,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1241160.0, ans=0.125 2023-12-23 17:06:37,792 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.703e+01 3.886e+01 3.999e+01 4.717e+01, threshold=7.771e+01, percent-clipped=0.0 2023-12-23 17:06:38,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-12-23 17:06:44,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-12-23 17:07:23,925 INFO [train.py:886] (3/4) Epoch 40, batch 350, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 4096907.32 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:07:24,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1241493.3333333333, ans=0.125 2023-12-23 17:07:38,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-12-23 17:07:48,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1241626.6666666667, ans=0.125 2023-12-23 17:07:55,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1241693.3333333333, ans=0.125 2023-12-23 17:08:06,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1241760.0, ans=0.125 2023-12-23 17:08:15,535 INFO [train.py:886] (3/4) Epoch 40, batch 400, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01239, audio_tagging_loss=0.01239, over 4284274.38 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:08:15,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1241826.6666666667, ans=0.0 2023-12-23 17:08:16,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1241826.6666666667, ans=0.2 2023-12-23 17:08:20,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1241826.6666666667, ans=0.125 2023-12-23 17:08:21,795 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.642e+01 3.774e+01 3.991e+01 4.784e+01, threshold=7.549e+01, percent-clipped=0.0 2023-12-23 17:08:42,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1241960.0, ans=0.1 2023-12-23 17:09:08,241 INFO [train.py:886] (3/4) Epoch 40, batch 450, loss[loss=0.01227, audio_tagging_loss=0.01227, over 23982.00 frames. ], tot_loss[loss=0.01212, audio_tagging_loss=0.01212, over 4432642.56 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:09:34,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1242293.3333333333, ans=0.125 2023-12-23 17:09:35,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1242293.3333333333, ans=0.125 2023-12-23 17:09:43,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1242360.0, ans=0.1 2023-12-23 17:09:56,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1242426.6666666667, ans=0.125 2023-12-23 17:09:57,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-23 17:09:57,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1242493.3333333333, ans=0.2 2023-12-23 17:09:58,577 INFO [train.py:886] (3/4) Epoch 40, batch 500, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4551267.23 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:02,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1242493.3333333333, ans=0.125 2023-12-23 17:10:04,902 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.282e+01 3.617e+01 3.778e+01 3.928e+01 4.794e+01, threshold=7.557e+01, percent-clipped=0.0 2023-12-23 17:10:06,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1242493.3333333333, ans=0.0 2023-12-23 17:10:18,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:22,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:27,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1242626.6666666667, ans=0.125 2023-12-23 17:10:33,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-12-23 17:10:37,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1242693.3333333333, ans=0.0 2023-12-23 17:10:39,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1242760.0, ans=0.2 2023-12-23 17:10:50,635 INFO [train.py:886] (3/4) Epoch 40, batch 550, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 4637439.37 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:10:55,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1242826.6666666667, ans=0.1 2023-12-23 17:11:03,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-12-23 17:11:12,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1242960.0, ans=0.2 2023-12-23 17:11:26,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1243026.6666666667, ans=0.125 2023-12-23 17:11:30,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1243093.3333333333, ans=0.125 2023-12-23 17:11:38,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1243093.3333333333, ans=0.1 2023-12-23 17:11:42,152 INFO [train.py:886] (3/4) Epoch 40, batch 600, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.0119, audio_tagging_loss=0.0119, over 4703820.42 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:11:45,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1243160.0, ans=0.125 2023-12-23 17:11:47,795 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.374e+01 3.661e+01 3.804e+01 3.984e+01 4.384e+01, threshold=7.608e+01, percent-clipped=0.0 2023-12-23 17:11:54,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1243226.6666666667, ans=0.125 2023-12-23 17:11:57,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1243226.6666666667, ans=0.125 2023-12-23 17:12:05,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1243293.3333333333, ans=0.1 2023-12-23 17:12:18,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1243360.0, ans=0.2 2023-12-23 17:12:32,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1243493.3333333333, ans=0.1 2023-12-23 17:12:34,241 INFO [train.py:886] (3/4) Epoch 40, batch 650, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24122.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4757804.46 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:12:39,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243493.3333333333, ans=0.0 2023-12-23 17:12:46,469 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:12:50,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1243560.0, ans=0.125 2023-12-23 17:13:06,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1243693.3333333333, ans=0.2 2023-12-23 17:13:09,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1243693.3333333333, ans=0.125 2023-12-23 17:13:16,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1243760.0, ans=0.125 2023-12-23 17:13:25,776 INFO [train.py:886] (3/4) Epoch 40, batch 700, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4800076.39 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:13:32,120 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.657e+01 3.830e+01 4.035e+01 4.623e+01, threshold=7.660e+01, percent-clipped=0.0 2023-12-23 17:13:51,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1243960.0, ans=0.125 2023-12-23 17:13:57,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=8.0 2023-12-23 17:14:06,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1244093.3333333333, ans=0.1 2023-12-23 17:14:18,463 INFO [train.py:886] (3/4) Epoch 40, batch 750, loss[loss=0.009183, audio_tagging_loss=0.009183, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4832732.27 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:14:24,304 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:14:41,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2023-12-23 17:14:43,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=8.0 2023-12-23 17:14:45,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1244293.3333333333, ans=0.0 2023-12-23 17:14:47,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1244293.3333333333, ans=0.05 2023-12-23 17:14:55,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1244360.0, ans=0.0 2023-12-23 17:15:09,551 INFO [train.py:886] (3/4) Epoch 40, batch 800, loss[loss=0.009948, audio_tagging_loss=0.009948, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4854597.04 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:15:13,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1244493.3333333333, ans=0.125 2023-12-23 17:15:16,526 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.149e+01 3.626e+01 3.799e+01 3.971e+01 4.663e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 17:15:27,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1244560.0, ans=0.125 2023-12-23 17:15:32,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1244626.6666666667, ans=0.125 2023-12-23 17:15:48,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1244693.3333333333, ans=0.125 2023-12-23 17:16:01,959 INFO [train.py:886] (3/4) Epoch 40, batch 850, loss[loss=0.01475, audio_tagging_loss=0.01475, over 25000.00 frames. ], tot_loss[loss=0.01172, audio_tagging_loss=0.01172, over 4879903.93 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2023-12-23 17:16:04,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1244826.6666666667, ans=0.0 2023-12-23 17:16:19,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1244893.3333333333, ans=0.2 2023-12-23 17:16:34,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1245026.6666666667, ans=0.1 2023-12-23 17:16:54,244 INFO [train.py:886] (3/4) Epoch 40, batch 900, loss[loss=0.01331, audio_tagging_loss=0.01331, over 24750.00 frames. ], tot_loss[loss=0.01178, audio_tagging_loss=0.01178, over 4899578.63 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:16:54,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-12-23 17:16:55,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1245160.0, ans=0.2 2023-12-23 17:17:00,644 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.293e+01 3.650e+01 3.794e+01 3.949e+01 4.349e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 17:17:07,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1245226.6666666667, ans=0.09899494936611666 2023-12-23 17:17:10,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1245226.6666666667, ans=0.125 2023-12-23 17:17:12,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1245226.6666666667, ans=0.125 2023-12-23 17:17:12,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1245226.6666666667, ans=0.125 2023-12-23 17:17:20,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1245293.3333333333, ans=0.125 2023-12-23 17:17:32,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1245360.0, ans=0.125 2023-12-23 17:17:46,223 INFO [train.py:886] (3/4) Epoch 40, batch 950, loss[loss=0.0134, audio_tagging_loss=0.0134, over 24943.00 frames. ], tot_loss[loss=0.01185, audio_tagging_loss=0.01185, over 4905698.56 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:17:52,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1245493.3333333333, ans=0.1 2023-12-23 17:17:58,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1245560.0, ans=0.1 2023-12-23 17:18:13,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1245626.6666666667, ans=0.2 2023-12-23 17:18:24,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-12-23 17:18:37,902 INFO [train.py:886] (3/4) Epoch 40, batch 1000, loss[loss=0.009516, audio_tagging_loss=0.009516, over 24750.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4907835.71 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 32.0 2023-12-23 17:18:44,299 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.312e+01 3.606e+01 3.769e+01 4.019e+01 4.543e+01, threshold=7.537e+01, percent-clipped=0.0 2023-12-23 17:19:07,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2023-12-23 17:19:28,920 INFO [train.py:886] (3/4) Epoch 40, batch 1050, loss[loss=0.009663, audio_tagging_loss=0.009663, over 24750.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4911512.27 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:19:42,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1246226.6666666667, ans=0.125 2023-12-23 17:19:49,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1246226.6666666667, ans=0.07 2023-12-23 17:20:01,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1246360.0, ans=0.1 2023-12-23 17:20:04,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1246360.0, ans=10.0 2023-12-23 17:20:10,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1246426.6666666667, ans=0.125 2023-12-23 17:20:21,901 INFO [train.py:886] (3/4) Epoch 40, batch 1100, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24916.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4922353.38 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:20:25,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1246493.3333333333, ans=0.125 2023-12-23 17:20:26,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2023-12-23 17:20:27,707 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.272e+01 3.694e+01 3.833e+01 4.003e+01 4.303e+01, threshold=7.667e+01, percent-clipped=0.0 2023-12-23 17:21:05,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1246760.0, ans=0.125 2023-12-23 17:21:09,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1246760.0, ans=0.125 2023-12-23 17:21:14,033 INFO [train.py:886] (3/4) Epoch 40, batch 1150, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4929086.54 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:21:18,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-12-23 17:21:23,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1246893.3333333333, ans=0.2 2023-12-23 17:22:05,586 INFO [train.py:886] (3/4) Epoch 40, batch 1200, loss[loss=0.01204, audio_tagging_loss=0.01204, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4931580.92 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:22:10,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1247160.0, ans=0.2 2023-12-23 17:22:11,285 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.252e+01 3.680e+01 3.818e+01 4.022e+01 4.894e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 17:22:11,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1247160.0, ans=0.2 2023-12-23 17:22:26,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1247293.3333333333, ans=0.0 2023-12-23 17:22:28,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1247293.3333333333, ans=0.125 2023-12-23 17:22:31,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1247293.3333333333, ans=0.125 2023-12-23 17:22:36,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1247360.0, ans=0.1 2023-12-23 17:22:50,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1247426.6666666667, ans=0.125 2023-12-23 17:22:52,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1247426.6666666667, ans=0.1 2023-12-23 17:22:57,005 INFO [train.py:886] (3/4) Epoch 40, batch 1250, loss[loss=0.01412, audio_tagging_loss=0.01412, over 24750.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4936442.40 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:23:04,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 17:23:08,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1247560.0, ans=0.0 2023-12-23 17:23:26,696 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:23:33,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1247693.3333333333, ans=0.125 2023-12-23 17:23:33,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1247693.3333333333, ans=0.125 2023-12-23 17:23:38,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1247760.0, ans=0.0 2023-12-23 17:23:47,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-12-23 17:23:48,810 INFO [train.py:886] (3/4) Epoch 40, batch 1300, loss[loss=0.01085, audio_tagging_loss=0.01085, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4937453.48 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:23:51,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1247826.6666666667, ans=0.1 2023-12-23 17:23:51,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1247826.6666666667, ans=0.5 2023-12-23 17:23:54,418 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.303e+01 3.637e+01 3.803e+01 3.958e+01 5.134e+01, threshold=7.605e+01, percent-clipped=0.0 2023-12-23 17:24:02,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1247893.3333333333, ans=0.125 2023-12-23 17:24:06,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1247893.3333333333, ans=0.0 2023-12-23 17:24:16,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1247960.0, ans=0.07 2023-12-23 17:24:41,338 INFO [train.py:886] (3/4) Epoch 40, batch 1350, loss[loss=0.00874, audio_tagging_loss=0.00874, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4943923.53 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:24:57,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1248226.6666666667, ans=0.1 2023-12-23 17:25:23,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1248426.6666666667, ans=0.0 2023-12-23 17:25:32,068 INFO [train.py:886] (3/4) Epoch 40, batch 1400, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4946236.70 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:25:39,074 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.177e+01 3.609e+01 3.720e+01 3.896e+01 4.432e+01, threshold=7.440e+01, percent-clipped=0.0 2023-12-23 17:26:02,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-12-23 17:26:15,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1248760.0, ans=0.2 2023-12-23 17:26:15,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-23 17:26:23,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1248826.6666666667, ans=0.2 2023-12-23 17:26:24,057 INFO [train.py:886] (3/4) Epoch 40, batch 1450, loss[loss=0.01201, audio_tagging_loss=0.01201, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4951074.58 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:26:31,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=1248826.6666666667, ans=12.0 2023-12-23 17:26:31,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1248826.6666666667, ans=0.1 2023-12-23 17:26:45,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1248960.0, ans=0.125 2023-12-23 17:26:47,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1248960.0, ans=0.125 2023-12-23 17:27:00,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=12.0 2023-12-23 17:27:07,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1249093.3333333333, ans=0.0 2023-12-23 17:27:12,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-12-23 17:27:12,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1249093.3333333333, ans=0.1 2023-12-23 17:27:15,479 INFO [train.py:886] (3/4) Epoch 40, batch 1500, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4949524.35 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:27:22,484 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.173e+01 3.651e+01 3.787e+01 4.002e+01 4.522e+01, threshold=7.575e+01, percent-clipped=0.0 2023-12-23 17:27:35,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1249293.3333333333, ans=0.125 2023-12-23 17:27:36,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1249293.3333333333, ans=0.0 2023-12-23 17:27:47,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1249360.0, ans=0.125 2023-12-23 17:27:54,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1249360.0, ans=0.5 2023-12-23 17:28:08,004 INFO [train.py:886] (3/4) Epoch 40, batch 1550, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4954075.97 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:28:09,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2023-12-23 17:28:11,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1249493.3333333333, ans=0.025 2023-12-23 17:28:23,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1249560.0, ans=0.1 2023-12-23 17:28:47,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1249693.3333333333, ans=0.2 2023-12-23 17:28:59,830 INFO [train.py:886] (3/4) Epoch 40, batch 1600, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4936991.29 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:29:04,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1249826.6666666667, ans=0.1 2023-12-23 17:29:05,486 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.668e+01 3.855e+01 4.043e+01 4.586e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 17:29:14,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1249893.3333333333, ans=0.0 2023-12-23 17:29:15,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1249893.3333333333, ans=0.2 2023-12-23 17:29:15,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1249893.3333333333, ans=0.1 2023-12-23 17:29:24,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2023-12-23 17:29:32,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-12-23 17:29:37,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1250026.6666666667, ans=0.1 2023-12-23 17:29:40,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1250093.3333333333, ans=0.125 2023-12-23 17:29:45,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1250093.3333333333, ans=0.0 2023-12-23 17:29:50,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1250160.0, ans=0.05 2023-12-23 17:29:51,687 INFO [train.py:886] (3/4) Epoch 40, batch 1650, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4940147.75 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:29:58,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1250160.0, ans=0.125 2023-12-23 17:30:37,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1250426.6666666667, ans=0.125 2023-12-23 17:30:43,752 INFO [train.py:886] (3/4) Epoch 40, batch 1700, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4943098.87 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:30:50,034 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.211e+01 3.572e+01 3.767e+01 3.944e+01 4.587e+01, threshold=7.535e+01, percent-clipped=0.0 2023-12-23 17:30:50,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1250493.3333333333, ans=0.2 2023-12-23 17:31:03,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1250560.0, ans=0.125 2023-12-23 17:31:07,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1250626.6666666667, ans=0.125 2023-12-23 17:31:24,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1250760.0, ans=0.0 2023-12-23 17:31:34,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-12-23 17:31:36,383 INFO [train.py:886] (3/4) Epoch 40, batch 1750, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4945051.33 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:31:53,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1250893.3333333333, ans=0.0 2023-12-23 17:31:53,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1250893.3333333333, ans=0.1 2023-12-23 17:31:57,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1250960.0, ans=0.0 2023-12-23 17:32:01,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1250960.0, ans=0.125 2023-12-23 17:32:04,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1250960.0, ans=0.125 2023-12-23 17:32:08,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1251026.6666666667, ans=0.0 2023-12-23 17:32:11,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1251026.6666666667, ans=15.0 2023-12-23 17:32:11,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-12-23 17:32:14,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1251026.6666666667, ans=0.0 2023-12-23 17:32:18,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1251093.3333333333, ans=0.125 2023-12-23 17:32:20,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1251093.3333333333, ans=0.2 2023-12-23 17:32:24,487 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:32:28,084 INFO [train.py:886] (3/4) Epoch 40, batch 1800, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4949277.88 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:32:34,595 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.246e+01 3.649e+01 3.797e+01 4.032e+01 4.855e+01, threshold=7.595e+01, percent-clipped=0.0 2023-12-23 17:32:51,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1251293.3333333333, ans=0.125 2023-12-23 17:32:51,573 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:33:00,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1251360.0, ans=0.125 2023-12-23 17:33:01,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1251360.0, ans=0.2 2023-12-23 17:33:16,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1251426.6666666667, ans=0.125 2023-12-23 17:33:20,811 INFO [train.py:886] (3/4) Epoch 40, batch 1850, loss[loss=0.009979, audio_tagging_loss=0.009979, over 24750.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4952249.33 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:33:26,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-12-23 17:34:04,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1251760.0, ans=0.0 2023-12-23 17:34:06,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1251760.0, ans=0.0 2023-12-23 17:34:12,174 INFO [train.py:886] (3/4) Epoch 40, batch 1900, loss[loss=0.01272, audio_tagging_loss=0.01272, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4942688.70 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:34:18,621 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.703e+01 3.895e+01 4.075e+01 4.598e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 17:34:23,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-12-23 17:34:33,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1251960.0, ans=0.0 2023-12-23 17:34:43,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1252026.6666666667, ans=0.125 2023-12-23 17:34:43,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1252026.6666666667, ans=0.125 2023-12-23 17:34:50,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1252026.6666666667, ans=0.5 2023-12-23 17:34:54,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1252093.3333333333, ans=0.04949747468305833 2023-12-23 17:34:55,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1252093.3333333333, ans=0.125 2023-12-23 17:35:01,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1252093.3333333333, ans=0.125 2023-12-23 17:35:02,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1252093.3333333333, ans=0.0 2023-12-23 17:35:04,804 INFO [train.py:886] (3/4) Epoch 40, batch 1950, loss[loss=0.009937, audio_tagging_loss=0.009937, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4943432.15 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:35:05,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1252160.0, ans=0.125 2023-12-23 17:35:06,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-12-23 17:35:16,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1252226.6666666667, ans=0.125 2023-12-23 17:35:33,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-12-23 17:35:56,363 INFO [train.py:886] (3/4) Epoch 40, batch 2000, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4950383.52 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:36:02,093 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.938e+01 3.596e+01 3.830e+01 3.994e+01 4.617e+01, threshold=7.661e+01, percent-clipped=0.0 2023-12-23 17:36:03,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1252493.3333333333, ans=0.125 2023-12-23 17:36:39,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2023-12-23 17:36:45,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1252760.0, ans=0.125 2023-12-23 17:36:47,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1252826.6666666667, ans=0.125 2023-12-23 17:36:48,984 INFO [train.py:886] (3/4) Epoch 40, batch 2050, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4952095.71 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:36:52,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1252826.6666666667, ans=0.125 2023-12-23 17:37:25,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1253026.6666666667, ans=0.0 2023-12-23 17:37:28,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253093.3333333333, ans=0.1 2023-12-23 17:37:32,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1253093.3333333333, ans=0.0 2023-12-23 17:37:39,744 INFO [train.py:886] (3/4) Epoch 40, batch 2100, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4950234.97 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:37:46,108 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.253e+01 3.675e+01 3.853e+01 3.940e+01 4.464e+01, threshold=7.707e+01, percent-clipped=0.0 2023-12-23 17:37:52,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1253226.6666666667, ans=0.1 2023-12-23 17:37:58,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1253226.6666666667, ans=0.1 2023-12-23 17:38:00,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1253293.3333333333, ans=0.2 2023-12-23 17:38:10,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253293.3333333333, ans=0.1 2023-12-23 17:38:32,984 INFO [train.py:886] (3/4) Epoch 40, batch 2150, loss[loss=0.0144, audio_tagging_loss=0.0144, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4952131.07 frames. ], batch size: 100, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:38:36,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1253493.3333333333, ans=0.0 2023-12-23 17:38:43,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1253560.0, ans=0.0 2023-12-23 17:38:51,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1253560.0, ans=0.125 2023-12-23 17:38:58,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2023-12-23 17:38:59,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1253626.6666666667, ans=0.0 2023-12-23 17:39:03,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1253693.3333333333, ans=0.125 2023-12-23 17:39:03,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1253693.3333333333, ans=0.0 2023-12-23 17:39:08,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1253693.3333333333, ans=0.125 2023-12-23 17:39:09,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1253693.3333333333, ans=0.125 2023-12-23 17:39:09,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1253693.3333333333, ans=0.0 2023-12-23 17:39:22,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1253760.0, ans=0.125 2023-12-23 17:39:24,494 INFO [train.py:886] (3/4) Epoch 40, batch 2200, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4947934.97 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:39:30,818 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.640e+01 3.872e+01 4.053e+01 7.102e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 17:39:45,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1253960.0, ans=0.0 2023-12-23 17:40:08,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1254093.3333333333, ans=0.1 2023-12-23 17:40:10,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.0 2023-12-23 17:40:13,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-12-23 17:40:15,870 INFO [train.py:886] (3/4) Epoch 40, batch 2250, loss[loss=0.01216, audio_tagging_loss=0.01216, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4943623.12 frames. ], batch size: 99, lr: 2.69e-03, grad_scale: 64.0 2023-12-23 17:40:29,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1254226.6666666667, ans=15.0 2023-12-23 17:40:35,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1254226.6666666667, ans=0.125 2023-12-23 17:40:39,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1254293.3333333333, ans=0.125 2023-12-23 17:40:39,097 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:40:47,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1254360.0, ans=0.95 2023-12-23 17:41:01,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1254426.6666666667, ans=0.125 2023-12-23 17:41:02,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1254426.6666666667, ans=0.125 2023-12-23 17:41:08,086 INFO [train.py:886] (3/4) Epoch 40, batch 2300, loss[loss=0.01159, audio_tagging_loss=0.01159, over 23887.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4942041.45 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:41:13,761 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.633e+01 3.746e+01 3.973e+01 4.571e+01, threshold=7.491e+01, percent-clipped=0.0 2023-12-23 17:41:17,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1254560.0, ans=0.0 2023-12-23 17:41:26,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1254560.0, ans=0.125 2023-12-23 17:41:35,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1254626.6666666667, ans=0.125 2023-12-23 17:41:38,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1254693.3333333333, ans=0.125 2023-12-23 17:41:53,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2023-12-23 17:41:54,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1254760.0, ans=0.0 2023-12-23 17:41:59,052 INFO [train.py:886] (3/4) Epoch 40, batch 2350, loss[loss=0.01328, audio_tagging_loss=0.01328, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4943642.13 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:41:59,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1254826.6666666667, ans=0.125 2023-12-23 17:42:07,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2023-12-23 17:42:10,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.38 vs. limit=15.0 2023-12-23 17:42:20,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1254960.0, ans=0.1 2023-12-23 17:42:23,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1254960.0, ans=0.07 2023-12-23 17:42:40,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.98 vs. limit=10.0 2023-12-23 17:42:42,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1255093.3333333333, ans=0.0 2023-12-23 17:42:43,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=22.5 2023-12-23 17:42:51,665 INFO [train.py:886] (3/4) Epoch 40, batch 2400, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4950496.95 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:42:57,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.192e+01 3.652e+01 3.798e+01 3.956e+01 4.622e+01, threshold=7.596e+01, percent-clipped=0.0 2023-12-23 17:43:05,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1255226.6666666667, ans=0.125 2023-12-23 17:43:07,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1255226.6666666667, ans=0.125 2023-12-23 17:43:14,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1255293.3333333333, ans=0.125 2023-12-23 17:43:17,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1255293.3333333333, ans=0.125 2023-12-23 17:43:23,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1255360.0, ans=0.1 2023-12-23 17:43:43,021 INFO [train.py:886] (3/4) Epoch 40, batch 2450, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4950726.01 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:43:43,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1255493.3333333333, ans=0.0 2023-12-23 17:43:45,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1255493.3333333333, ans=0.09899494936611666 2023-12-23 17:43:59,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2023-12-23 17:44:03,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-12-23 17:44:04,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1255626.6666666667, ans=0.0 2023-12-23 17:44:09,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1255626.6666666667, ans=0.125 2023-12-23 17:44:30,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1255760.0, ans=0.0 2023-12-23 17:44:34,761 INFO [train.py:886] (3/4) Epoch 40, batch 2500, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4940979.55 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:44:35,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1255826.6666666667, ans=0.0 2023-12-23 17:44:39,797 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:44:40,518 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.292e+01 3.682e+01 3.858e+01 3.997e+01 4.509e+01, threshold=7.716e+01, percent-clipped=0.0 2023-12-23 17:44:48,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1255893.3333333333, ans=0.125 2023-12-23 17:44:49,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1255893.3333333333, ans=0.0 2023-12-23 17:45:03,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=12.0 2023-12-23 17:45:11,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-12-23 17:45:16,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1256093.3333333333, ans=0.07 2023-12-23 17:45:26,456 INFO [train.py:886] (3/4) Epoch 40, batch 2550, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4936616.61 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:45:34,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1256160.0, ans=0.1 2023-12-23 17:45:40,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-12-23 17:45:46,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1256293.3333333333, ans=0.0 2023-12-23 17:45:55,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256293.3333333333, ans=0.1 2023-12-23 17:45:56,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1256360.0, ans=0.125 2023-12-23 17:45:59,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1256360.0, ans=0.0 2023-12-23 17:46:02,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1256360.0, ans=0.125 2023-12-23 17:46:17,275 INFO [train.py:886] (3/4) Epoch 40, batch 2600, loss[loss=0.01233, audio_tagging_loss=0.01233, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4937862.29 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:46:22,945 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.345e+01 3.708e+01 3.865e+01 4.030e+01 5.247e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 17:46:23,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1256493.3333333333, ans=0.0 2023-12-23 17:46:26,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-12-23 17:46:38,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1256626.6666666667, ans=0.0 2023-12-23 17:46:40,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256626.6666666667, ans=0.1 2023-12-23 17:46:48,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1256693.3333333333, ans=0.0 2023-12-23 17:46:57,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1256760.0, ans=0.0 2023-12-23 17:47:02,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1256760.0, ans=0.2 2023-12-23 17:47:09,734 INFO [train.py:886] (3/4) Epoch 40, batch 2650, loss[loss=0.01006, audio_tagging_loss=0.01006, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4943956.35 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:47:39,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2023-12-23 17:47:53,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-12-23 17:47:56,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1257093.3333333333, ans=0.2 2023-12-23 17:48:00,527 INFO [train.py:886] (3/4) Epoch 40, batch 2700, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4944224.12 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:48:06,812 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.216e+01 3.619e+01 3.803e+01 3.974e+01 4.399e+01, threshold=7.606e+01, percent-clipped=0.0 2023-12-23 17:48:20,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-12-23 17:48:22,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1257293.3333333333, ans=0.0 2023-12-23 17:48:37,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1257360.0, ans=0.125 2023-12-23 17:48:38,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1257360.0, ans=0.09899494936611666 2023-12-23 17:48:46,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1257426.6666666667, ans=0.015 2023-12-23 17:48:52,682 INFO [train.py:886] (3/4) Epoch 40, batch 2750, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4949564.01 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:14,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1257626.6666666667, ans=0.2 2023-12-23 17:49:27,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1257693.3333333333, ans=0.07 2023-12-23 17:49:36,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1257760.0, ans=0.0 2023-12-23 17:49:40,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1257760.0, ans=0.1 2023-12-23 17:49:41,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-12-23 17:49:44,272 INFO [train.py:886] (3/4) Epoch 40, batch 2800, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4948402.89 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:49:51,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-12-23 17:49:51,396 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.697e+01 3.809e+01 4.002e+01 4.614e+01, threshold=7.617e+01, percent-clipped=0.0 2023-12-23 17:49:56,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1257893.3333333333, ans=0.0 2023-12-23 17:50:15,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1258026.6666666667, ans=0.0 2023-12-23 17:50:16,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1258026.6666666667, ans=0.0 2023-12-23 17:50:36,621 INFO [train.py:886] (3/4) Epoch 40, batch 2850, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4938208.02 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:50:39,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1258160.0, ans=0.0 2023-12-23 17:51:08,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2023-12-23 17:51:12,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-12-23 17:51:28,918 INFO [train.py:886] (3/4) Epoch 40, batch 2900, loss[loss=0.01285, audio_tagging_loss=0.01285, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4935877.96 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:51:34,565 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.699e+01 3.835e+01 4.007e+01 4.764e+01, threshold=7.669e+01, percent-clipped=0.0 2023-12-23 17:51:48,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1258626.6666666667, ans=0.0 2023-12-23 17:52:20,899 INFO [train.py:886] (3/4) Epoch 40, batch 2950, loss[loss=0.01412, audio_tagging_loss=0.01412, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4939988.35 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:52:22,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1258826.6666666667, ans=0.0 2023-12-23 17:52:38,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-12-23 17:52:46,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1258960.0, ans=0.125 2023-12-23 17:52:53,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1259026.6666666667, ans=0.125 2023-12-23 17:52:56,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1259026.6666666667, ans=0.1 2023-12-23 17:52:59,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1259026.6666666667, ans=0.125 2023-12-23 17:53:12,731 INFO [train.py:886] (3/4) Epoch 40, batch 3000, loss[loss=0.01282, audio_tagging_loss=0.01282, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4943153.03 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:53:12,731 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 17:53:33,969 INFO [train.py:917] (3/4) Epoch 40, validation: loss=0.03529, audio_tagging_loss=0.03529, over 3737520.00 frames. 2023-12-23 17:53:33,969 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 17:53:39,601 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.171e+01 3.612e+01 3.801e+01 4.054e+01 4.780e+01, threshold=7.602e+01, percent-clipped=0.0 2023-12-23 17:53:43,318 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:53:45,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1259226.6666666667, ans=0.0 2023-12-23 17:53:53,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1259293.3333333333, ans=0.07 2023-12-23 17:53:57,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1259293.3333333333, ans=0.125 2023-12-23 17:53:59,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1259293.3333333333, ans=0.125 2023-12-23 17:54:02,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-12-23 17:54:03,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1259360.0, ans=0.0 2023-12-23 17:54:03,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1259360.0, ans=10.0 2023-12-23 17:54:13,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1259426.6666666667, ans=0.125 2023-12-23 17:54:25,429 INFO [train.py:886] (3/4) Epoch 40, batch 3050, loss[loss=0.01299, audio_tagging_loss=0.01299, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4943539.96 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:54:27,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2023-12-23 17:54:28,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1259493.3333333333, ans=0.125 2023-12-23 17:55:12,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1259760.0, ans=0.125 2023-12-23 17:55:16,905 INFO [train.py:886] (3/4) Epoch 40, batch 3100, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4948996.70 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:55:24,102 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.262e+01 3.674e+01 3.864e+01 4.007e+01 4.526e+01, threshold=7.728e+01, percent-clipped=0.0 2023-12-23 17:55:27,139 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:55:27,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1259893.3333333333, ans=0.125 2023-12-23 17:55:32,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1259893.3333333333, ans=0.125 2023-12-23 17:55:39,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2023-12-23 17:55:46,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.48 vs. limit=10.0 2023-12-23 17:55:48,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1260026.6666666667, ans=0.0 2023-12-23 17:56:08,319 INFO [train.py:886] (3/4) Epoch 40, batch 3150, loss[loss=0.01239, audio_tagging_loss=0.01239, over 21657.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4944849.78 frames. ], batch size: 107, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:56:34,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1260293.3333333333, ans=0.125 2023-12-23 17:56:36,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1260293.3333333333, ans=0.125 2023-12-23 17:56:41,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1260360.0, ans=0.125 2023-12-23 17:56:42,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-12-23 17:56:46,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1260360.0, ans=0.95 2023-12-23 17:56:52,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1260426.6666666667, ans=0.125 2023-12-23 17:57:00,304 INFO [train.py:886] (3/4) Epoch 40, batch 3200, loss[loss=0.009279, audio_tagging_loss=0.009279, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4941698.09 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:57:05,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1260493.3333333333, ans=0.2 2023-12-23 17:57:06,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1260493.3333333333, ans=0.2 2023-12-23 17:57:07,589 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.204e+01 3.736e+01 3.854e+01 4.070e+01 4.738e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 17:57:21,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1260626.6666666667, ans=0.125 2023-12-23 17:57:21,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1260626.6666666667, ans=0.1 2023-12-23 17:57:24,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1260626.6666666667, ans=0.0 2023-12-23 17:57:31,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1260693.3333333333, ans=0.125 2023-12-23 17:57:50,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1260826.6666666667, ans=0.1 2023-12-23 17:57:51,656 INFO [train.py:886] (3/4) Epoch 40, batch 3250, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4945199.88 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:05,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-23 17:58:10,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1260893.3333333333, ans=0.07 2023-12-23 17:58:13,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1260960.0, ans=0.0 2023-12-23 17:58:14,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1260960.0, ans=0.125 2023-12-23 17:58:19,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-12-23 17:58:23,145 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:58:33,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-12-23 17:58:43,827 INFO [train.py:886] (3/4) Epoch 40, batch 3300, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4949573.86 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:58:46,096 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:58:49,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1261160.0, ans=0.125 2023-12-23 17:58:51,271 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.619e+01 3.846e+01 4.004e+01 5.622e+01, threshold=7.691e+01, percent-clipped=0.0 2023-12-23 17:58:52,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-12-23 17:59:07,330 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:59:12,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1261293.3333333333, ans=0.125 2023-12-23 17:59:16,716 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 17:59:35,962 INFO [train.py:886] (3/4) Epoch 40, batch 3350, loss[loss=0.01017, audio_tagging_loss=0.01017, over 21751.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4949291.54 frames. ], batch size: 107, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 17:59:59,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1261626.6666666667, ans=0.1 2023-12-23 18:00:00,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1261626.6666666667, ans=0.125 2023-12-23 18:00:05,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1261693.3333333333, ans=0.0 2023-12-23 18:00:22,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1261760.0, ans=0.1 2023-12-23 18:00:28,502 INFO [train.py:886] (3/4) Epoch 40, batch 3400, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4950118.35 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:00:35,078 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.304e+01 3.654e+01 3.811e+01 4.007e+01 4.560e+01, threshold=7.622e+01, percent-clipped=0.0 2023-12-23 18:00:54,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2023-12-23 18:01:13,135 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:01:15,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1262093.3333333333, ans=0.2 2023-12-23 18:01:18,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1262093.3333333333, ans=0.2 2023-12-23 18:01:20,795 INFO [train.py:886] (3/4) Epoch 40, batch 3450, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4947520.86 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:01:26,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=12.0 2023-12-23 18:01:27,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1262160.0, ans=0.04949747468305833 2023-12-23 18:01:52,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1262360.0, ans=0.2 2023-12-23 18:01:55,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1262360.0, ans=0.0 2023-12-23 18:02:11,090 INFO [train.py:886] (3/4) Epoch 40, batch 3500, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4947913.67 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:02:14,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1262493.3333333333, ans=0.09899494936611666 2023-12-23 18:02:15,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1262493.3333333333, ans=0.0 2023-12-23 18:02:18,417 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 3.696e+01 3.842e+01 3.983e+01 4.882e+01, threshold=7.684e+01, percent-clipped=0.0 2023-12-23 18:02:24,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1262560.0, ans=0.2 2023-12-23 18:02:25,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-12-23 18:02:53,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1262760.0, ans=0.5 2023-12-23 18:02:53,243 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:02:59,441 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:03:02,237 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:03:04,002 INFO [train.py:886] (3/4) Epoch 40, batch 3550, loss[loss=0.01038, audio_tagging_loss=0.01038, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4946775.35 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:03:16,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-12-23 18:03:22,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1262893.3333333333, ans=0.1 2023-12-23 18:03:54,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1263160.0, ans=0.125 2023-12-23 18:03:55,063 INFO [train.py:886] (3/4) Epoch 40, batch 3600, loss[loss=0.01001, audio_tagging_loss=0.01001, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4948855.53 frames. ], batch size: 99, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:04:03,184 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.291e+01 3.684e+01 3.812e+01 3.997e+01 4.511e+01, threshold=7.624e+01, percent-clipped=0.0 2023-12-23 18:04:07,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1263226.6666666667, ans=0.125 2023-12-23 18:04:08,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1263226.6666666667, ans=0.125 2023-12-23 18:04:14,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1263226.6666666667, ans=0.125 2023-12-23 18:04:21,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1263293.3333333333, ans=0.2 2023-12-23 18:04:27,381 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:04:38,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2023-12-23 18:04:43,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1263426.6666666667, ans=0.0 2023-12-23 18:04:47,307 INFO [train.py:886] (3/4) Epoch 40, batch 3650, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4951193.49 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 64.0 2023-12-23 18:05:03,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1263560.0, ans=0.0 2023-12-23 18:05:12,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1263626.6666666667, ans=10.0 2023-12-23 18:05:15,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2023-12-23 18:05:19,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2023-12-23 18:05:38,707 INFO [train.py:886] (3/4) Epoch 40, batch 3700, loss[loss=0.01199, audio_tagging_loss=0.01199, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4959164.89 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:05:42,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1263826.6666666667, ans=0.125 2023-12-23 18:05:43,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1263826.6666666667, ans=0.125 2023-12-23 18:05:46,058 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.659e+01 3.785e+01 3.953e+01 4.581e+01, threshold=7.570e+01, percent-clipped=0.0 2023-12-23 18:06:07,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1263960.0, ans=0.125 2023-12-23 18:06:08,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1264026.6666666667, ans=0.125 2023-12-23 18:06:11,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1264026.6666666667, ans=0.125 2023-12-23 18:06:30,152 INFO [train.py:886] (3/4) Epoch 40, batch 3750, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4959349.78 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:06:33,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1264160.0, ans=0.125 2023-12-23 18:06:34,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1264160.0, ans=0.125 2023-12-23 18:06:51,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-12-23 18:06:51,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1264293.3333333333, ans=0.0 2023-12-23 18:06:57,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1264293.3333333333, ans=0.125 2023-12-23 18:07:13,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1264426.6666666667, ans=0.0 2023-12-23 18:07:23,132 INFO [train.py:886] (3/4) Epoch 40, batch 3800, loss[loss=0.009683, audio_tagging_loss=0.009683, over 25000.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4954650.27 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:07:29,695 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.212e+01 3.721e+01 3.894e+01 4.067e+01 4.769e+01, threshold=7.788e+01, percent-clipped=0.0 2023-12-23 18:07:58,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1264693.3333333333, ans=0.0 2023-12-23 18:08:02,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1264760.0, ans=0.1 2023-12-23 18:08:04,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1264760.0, ans=0.2 2023-12-23 18:08:13,780 INFO [train.py:886] (3/4) Epoch 40, batch 3850, loss[loss=0.0117, audio_tagging_loss=0.0117, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4950286.65 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:08:15,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1264826.6666666667, ans=0.125 2023-12-23 18:08:22,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-12-23 18:08:28,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1264893.3333333333, ans=0.125 2023-12-23 18:08:31,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1264893.3333333333, ans=0.125 2023-12-23 18:08:42,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1264960.0, ans=0.2 2023-12-23 18:08:42,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1264960.0, ans=0.125 2023-12-23 18:08:44,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1265026.6666666667, ans=0.125 2023-12-23 18:08:52,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1265026.6666666667, ans=0.125 2023-12-23 18:09:05,593 INFO [train.py:886] (3/4) Epoch 40, batch 3900, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4952215.93 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:09:12,988 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.264e+01 3.639e+01 3.828e+01 4.025e+01 4.570e+01, threshold=7.656e+01, percent-clipped=0.0 2023-12-23 18:09:21,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-12-23 18:09:25,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1265293.3333333333, ans=0.04949747468305833 2023-12-23 18:09:36,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1265360.0, ans=0.0 2023-12-23 18:09:43,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2023-12-23 18:09:56,685 INFO [train.py:886] (3/4) Epoch 40, batch 3950, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4955693.97 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:09:59,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-12-23 18:10:09,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1265560.0, ans=0.0 2023-12-23 18:10:12,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1265560.0, ans=0.125 2023-12-23 18:10:13,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1265560.0, ans=0.125 2023-12-23 18:10:21,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1265626.6666666667, ans=0.125 2023-12-23 18:10:34,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265693.3333333333, ans=0.1 2023-12-23 18:10:38,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1265760.0, ans=0.125 2023-12-23 18:10:43,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-12-23 18:10:47,695 INFO [train.py:886] (3/4) Epoch 40, batch 4000, loss[loss=0.01302, audio_tagging_loss=0.01302, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4958180.14 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:10:52,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2023-12-23 18:10:55,001 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.601e+01 3.801e+01 3.964e+01 6.190e+01, threshold=7.601e+01, percent-clipped=0.0 2023-12-23 18:10:55,258 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:11:15,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1265960.0, ans=0.035 2023-12-23 18:11:21,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1266026.6666666667, ans=0.125 2023-12-23 18:11:25,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1266026.6666666667, ans=0.125 2023-12-23 18:11:34,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-12-23 18:11:40,060 INFO [train.py:886] (3/4) Epoch 40, batch 4050, loss[loss=0.01405, audio_tagging_loss=0.01405, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4961869.48 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:11:45,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-12-23 18:11:58,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2023-12-23 18:12:02,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1266293.3333333333, ans=0.0 2023-12-23 18:12:15,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1266360.0, ans=0.125 2023-12-23 18:12:25,110 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:12:27,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1266426.6666666667, ans=0.95 2023-12-23 18:12:29,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1266426.6666666667, ans=0.035 2023-12-23 18:12:31,437 INFO [train.py:886] (3/4) Epoch 40, batch 4100, loss[loss=0.01261, audio_tagging_loss=0.01261, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4957200.83 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:12:38,723 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.330e+01 3.783e+01 3.912e+01 4.094e+01 5.068e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 18:12:38,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1266493.3333333333, ans=0.2 2023-12-23 18:12:39,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1266493.3333333333, ans=0.125 2023-12-23 18:13:00,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1266626.6666666667, ans=0.0 2023-12-23 18:13:18,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1266760.0, ans=0.125 2023-12-23 18:13:24,054 INFO [train.py:886] (3/4) Epoch 40, batch 4150, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01171, audio_tagging_loss=0.01171, over 4947889.14 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:13:39,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1266893.3333333333, ans=0.125 2023-12-23 18:13:57,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1267026.6666666667, ans=0.015 2023-12-23 18:13:58,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1267026.6666666667, ans=0.04949747468305833 2023-12-23 18:14:09,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-23 18:14:12,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-12-23 18:14:15,854 INFO [train.py:886] (3/4) Epoch 40, batch 4200, loss[loss=0.01153, audio_tagging_loss=0.01153, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4952214.28 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:14:18,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1267160.0, ans=0.04949747468305833 2023-12-23 18:14:23,316 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.248e+01 3.687e+01 3.822e+01 4.029e+01 4.624e+01, threshold=7.645e+01, percent-clipped=0.0 2023-12-23 18:14:24,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1267160.0, ans=0.125 2023-12-23 18:14:52,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-12-23 18:14:58,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1267426.6666666667, ans=0.125 2023-12-23 18:15:04,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1267426.6666666667, ans=0.125 2023-12-23 18:15:08,323 INFO [train.py:886] (3/4) Epoch 40, batch 4250, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4957674.21 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:15:28,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1267626.6666666667, ans=10.0 2023-12-23 18:15:35,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1267626.6666666667, ans=0.09899494936611666 2023-12-23 18:15:40,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1267693.3333333333, ans=0.05 2023-12-23 18:15:57,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1267760.0, ans=0.0 2023-12-23 18:15:59,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1267826.6666666667, ans=0.2 2023-12-23 18:15:59,891 INFO [train.py:886] (3/4) Epoch 40, batch 4300, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4958573.32 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:06,445 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.393e+01 3.668e+01 3.826e+01 3.970e+01 4.663e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 18:16:08,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1267826.6666666667, ans=0.2 2023-12-23 18:16:22,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-12-23 18:16:33,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1268026.6666666667, ans=0.0 2023-12-23 18:16:39,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1268093.3333333333, ans=0.1 2023-12-23 18:16:46,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1268093.3333333333, ans=0.2 2023-12-23 18:16:51,150 INFO [train.py:886] (3/4) Epoch 40, batch 4350, loss[loss=0.01027, audio_tagging_loss=0.01027, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4962531.84 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:16:52,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1268160.0, ans=0.0 2023-12-23 18:16:56,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1268160.0, ans=15.0 2023-12-23 18:17:05,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1268226.6666666667, ans=0.125 2023-12-23 18:17:20,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1268293.3333333333, ans=0.125 2023-12-23 18:17:34,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1268426.6666666667, ans=0.125 2023-12-23 18:17:42,509 INFO [train.py:886] (3/4) Epoch 40, batch 4400, loss[loss=0.009198, audio_tagging_loss=0.009198, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4962323.45 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:17:42,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:17:50,633 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.697e+01 3.843e+01 4.013e+01 4.471e+01, threshold=7.687e+01, percent-clipped=0.0 2023-12-23 18:17:57,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1268560.0, ans=0.125 2023-12-23 18:17:59,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1268560.0, ans=0.0 2023-12-23 18:18:09,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1268626.6666666667, ans=0.0 2023-12-23 18:18:10,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1268626.6666666667, ans=0.0 2023-12-23 18:18:13,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.18 vs. limit=10.0 2023-12-23 18:18:29,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-12-23 18:18:31,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2023-12-23 18:18:35,363 INFO [train.py:886] (3/4) Epoch 40, batch 4450, loss[loss=0.01079, audio_tagging_loss=0.01079, over 24750.00 frames. ], tot_loss[loss=0.01176, audio_tagging_loss=0.01176, over 4954463.68 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:19:09,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1269026.6666666667, ans=0.125 2023-12-23 18:19:11,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1269026.6666666667, ans=0.125 2023-12-23 18:19:18,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1269093.3333333333, ans=0.0 2023-12-23 18:19:24,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1269093.3333333333, ans=0.125 2023-12-23 18:19:27,036 INFO [train.py:886] (3/4) Epoch 40, batch 4500, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4954859.05 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:19:34,275 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.692e+01 3.865e+01 4.053e+01 4.653e+01, threshold=7.730e+01, percent-clipped=0.0 2023-12-23 18:19:36,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-12-23 18:19:44,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1269226.6666666667, ans=0.05 2023-12-23 18:19:57,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2023-12-23 18:20:00,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1269360.0, ans=0.125 2023-12-23 18:20:03,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1269360.0, ans=0.125 2023-12-23 18:20:06,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1269360.0, ans=0.125 2023-12-23 18:20:11,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1269426.6666666667, ans=0.0 2023-12-23 18:20:18,856 INFO [train.py:886] (3/4) Epoch 40, batch 4550, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4954744.45 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:20:48,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2023-12-23 18:21:10,429 INFO [train.py:886] (3/4) Epoch 40, batch 4600, loss[loss=0.01047, audio_tagging_loss=0.01047, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4958175.72 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:21:10,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-12-23 18:21:15,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1269826.6666666667, ans=0.125 2023-12-23 18:21:17,653 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.202e+01 3.671e+01 3.796e+01 3.991e+01 4.710e+01, threshold=7.593e+01, percent-clipped=0.0 2023-12-23 18:21:20,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-12-23 18:21:24,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1269893.3333333333, ans=0.125 2023-12-23 18:21:27,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1269893.3333333333, ans=0.1 2023-12-23 18:21:33,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1269960.0, ans=0.0 2023-12-23 18:21:41,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1270026.6666666667, ans=0.125 2023-12-23 18:22:00,870 INFO [train.py:886] (3/4) Epoch 40, batch 4650, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4960129.32 frames. ], batch size: 100, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:11,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-23 18:22:14,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1270226.6666666667, ans=0.125 2023-12-23 18:22:20,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-12-23 18:22:39,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1270360.0, ans=10.0 2023-12-23 18:22:47,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1270426.6666666667, ans=0.125 2023-12-23 18:22:52,812 INFO [train.py:886] (3/4) Epoch 40, batch 4700, loss[loss=0.01339, audio_tagging_loss=0.01339, over 24750.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4962122.03 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:22:55,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1270493.3333333333, ans=0.125 2023-12-23 18:22:59,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.757e+01 3.899e+01 4.092e+01 5.478e+01, threshold=7.799e+01, percent-clipped=0.0 2023-12-23 18:23:01,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2023-12-23 18:23:08,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1270560.0, ans=0.125 2023-12-23 18:23:13,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1270626.6666666667, ans=0.1 2023-12-23 18:23:22,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1270693.3333333333, ans=0.125 2023-12-23 18:23:22,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1270693.3333333333, ans=0.125 2023-12-23 18:23:27,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1270693.3333333333, ans=0.125 2023-12-23 18:23:30,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1270760.0, ans=0.125 2023-12-23 18:23:39,620 INFO [train.py:886] (3/4) Epoch 40, batch 4750, loss[loss=0.01186, audio_tagging_loss=0.01186, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4960497.73 frames. ], batch size: 99, lr: 2.67e-03, grad_scale: 64.0 2023-12-23 18:23:40,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1270826.6666666667, ans=0.125 2023-12-23 18:24:13,759 INFO [train.py:886] (3/4) Epoch 41, batch 0, loss[loss=0.02711, audio_tagging_loss=0.02711, over 22218.00 frames. ], tot_loss[loss=0.02711, audio_tagging_loss=0.02711, over 22218.00 frames. ], batch size: 107, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:24:13,759 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 18:24:32,024 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6494, 3.0003, 4.1545, 3.8311], device='cuda:3') 2023-12-23 18:24:35,135 INFO [train.py:917] (3/4) Epoch 41, validation: loss=0.03496, audio_tagging_loss=0.03496, over 3737520.00 frames. 2023-12-23 18:24:35,135 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 18:25:02,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1271066.6666666667, ans=0.05 2023-12-23 18:25:05,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1271133.3333333333, ans=0.0 2023-12-23 18:25:18,894 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.874e+01 4.258e+01 5.303e+01 1.010e+02, threshold=8.517e+01, percent-clipped=7.0 2023-12-23 18:25:26,254 INFO [train.py:886] (3/4) Epoch 41, batch 50, loss[loss=0.01494, audio_tagging_loss=0.01494, over 25000.00 frames. ], tot_loss[loss=0.01865, audio_tagging_loss=0.01865, over 1124004.13 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:25:40,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1271333.3333333333, ans=0.125 2023-12-23 18:25:47,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-12-23 18:25:53,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2023-12-23 18:25:54,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1271400.0, ans=0.1 2023-12-23 18:26:06,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1271533.3333333333, ans=0.2 2023-12-23 18:26:12,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1271533.3333333333, ans=0.125 2023-12-23 18:26:14,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1271533.3333333333, ans=0.0 2023-12-23 18:26:18,050 INFO [train.py:886] (3/4) Epoch 41, batch 100, loss[loss=0.01324, audio_tagging_loss=0.01324, over 25000.00 frames. ], tot_loss[loss=0.01618, audio_tagging_loss=0.01618, over 1977141.84 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:26:38,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1271733.3333333333, ans=0.125 2023-12-23 18:26:45,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1271733.3333333333, ans=0.1 2023-12-23 18:27:03,060 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.548e+01 3.949e+01 4.182e+01 4.375e+01 5.097e+01, threshold=8.364e+01, percent-clipped=0.0 2023-12-23 18:27:09,800 INFO [train.py:886] (3/4) Epoch 41, batch 150, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.01469, audio_tagging_loss=0.01469, over 2641298.45 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:27:11,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1271933.3333333333, ans=0.2 2023-12-23 18:27:14,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1271933.3333333333, ans=0.0 2023-12-23 18:27:34,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1272066.6666666667, ans=0.0 2023-12-23 18:27:46,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1272133.3333333333, ans=0.0 2023-12-23 18:27:58,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-12-23 18:28:02,499 INFO [train.py:886] (3/4) Epoch 41, batch 200, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.0137, audio_tagging_loss=0.0137, over 3155503.45 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:28:10,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1272266.6666666667, ans=0.025 2023-12-23 18:28:10,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.28 vs. limit=10.0 2023-12-23 18:28:25,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1272400.0, ans=0.0 2023-12-23 18:28:34,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1272466.6666666667, ans=0.1 2023-12-23 18:28:46,896 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.687e+01 3.842e+01 3.967e+01 4.665e+01, threshold=7.685e+01, percent-clipped=0.0 2023-12-23 18:28:48,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1272533.3333333333, ans=0.1 2023-12-23 18:28:54,257 INFO [train.py:886] (3/4) Epoch 41, batch 250, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01308, audio_tagging_loss=0.01308, over 3556092.91 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:29:19,232 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:29:45,606 INFO [train.py:886] (3/4) Epoch 41, batch 300, loss[loss=0.01145, audio_tagging_loss=0.01145, over 21821.00 frames. ], tot_loss[loss=0.01283, audio_tagging_loss=0.01283, over 3859916.40 frames. ], batch size: 107, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:30:06,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1273066.6666666667, ans=0.0 2023-12-23 18:30:29,127 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.255e+01 3.673e+01 3.844e+01 4.121e+01 4.840e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 18:30:29,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-12-23 18:30:36,420 INFO [train.py:886] (3/4) Epoch 41, batch 350, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01261, audio_tagging_loss=0.01261, over 4101617.71 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 16.0 2023-12-23 18:30:37,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1273266.6666666667, ans=10.0 2023-12-23 18:30:44,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1273266.6666666667, ans=0.125 2023-12-23 18:30:46,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1273333.3333333333, ans=0.1 2023-12-23 18:30:46,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-12-23 18:30:49,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1273333.3333333333, ans=0.0 2023-12-23 18:30:49,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-12-23 18:31:15,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1273466.6666666667, ans=0.125 2023-12-23 18:31:19,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1273533.3333333333, ans=0.0 2023-12-23 18:31:28,756 INFO [train.py:886] (3/4) Epoch 41, batch 400, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01229, audio_tagging_loss=0.01229, over 4286293.72 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:31:29,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1273600.0, ans=0.0 2023-12-23 18:31:34,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1273600.0, ans=0.125 2023-12-23 18:31:42,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1273666.6666666667, ans=0.0 2023-12-23 18:32:01,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1273800.0, ans=0.125 2023-12-23 18:32:01,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1273800.0, ans=0.1 2023-12-23 18:32:13,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.313e+01 3.591e+01 3.744e+01 3.939e+01 4.826e+01, threshold=7.487e+01, percent-clipped=0.0 2023-12-23 18:32:19,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1273933.3333333333, ans=0.125 2023-12-23 18:32:20,557 INFO [train.py:886] (3/4) Epoch 41, batch 450, loss[loss=0.01012, audio_tagging_loss=0.01012, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4427415.26 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:33:12,243 INFO [train.py:886] (3/4) Epoch 41, batch 500, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01193, audio_tagging_loss=0.01193, over 4542129.18 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:33:27,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1274333.3333333333, ans=0.0 2023-12-23 18:33:29,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1274333.3333333333, ans=0.0 2023-12-23 18:33:39,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1274400.0, ans=0.1 2023-12-23 18:33:42,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 18:33:54,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1274533.3333333333, ans=0.125 2023-12-23 18:33:56,401 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.109e+01 3.650e+01 3.820e+01 3.998e+01 4.800e+01, threshold=7.639e+01, percent-clipped=0.0 2023-12-23 18:33:58,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1274533.3333333333, ans=0.125 2023-12-23 18:34:03,785 INFO [train.py:886] (3/4) Epoch 41, batch 550, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01187, audio_tagging_loss=0.01187, over 4635534.34 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:34:06,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1274600.0, ans=0.0 2023-12-23 18:34:14,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1274666.6666666667, ans=0.0 2023-12-23 18:34:30,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-12-23 18:34:32,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1274733.3333333333, ans=0.07 2023-12-23 18:34:56,190 INFO [train.py:886] (3/4) Epoch 41, batch 600, loss[loss=0.009425, audio_tagging_loss=0.009425, over 24750.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4706193.65 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:34:59,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-12-23 18:35:03,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1274933.3333333333, ans=0.0 2023-12-23 18:35:04,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-12-23 18:35:17,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1275066.6666666667, ans=0.125 2023-12-23 18:35:21,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1275066.6666666667, ans=0.125 2023-12-23 18:35:26,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-12-23 18:35:27,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1275133.3333333333, ans=0.1 2023-12-23 18:35:40,232 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.244e+01 3.747e+01 3.891e+01 4.086e+01 5.120e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 18:35:48,372 INFO [train.py:886] (3/4) Epoch 41, batch 650, loss[loss=0.01062, audio_tagging_loss=0.01062, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 4754886.64 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:35:58,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1275333.3333333333, ans=0.0 2023-12-23 18:36:00,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1275333.3333333333, ans=0.125 2023-12-23 18:36:10,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2023-12-23 18:36:36,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1275533.3333333333, ans=0.0 2023-12-23 18:36:39,879 INFO [train.py:886] (3/4) Epoch 41, batch 700, loss[loss=0.01062, audio_tagging_loss=0.01062, over 21920.00 frames. ], tot_loss[loss=0.01175, audio_tagging_loss=0.01175, over 4795224.68 frames. ], batch size: 107, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:37:18,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1275800.0, ans=0.125 2023-12-23 18:37:21,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1275866.6666666667, ans=0.0 2023-12-23 18:37:24,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.750e+01 3.860e+01 4.061e+01 4.621e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:37:28,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1275866.6666666667, ans=0.125 2023-12-23 18:37:32,203 INFO [train.py:886] (3/4) Epoch 41, batch 750, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01169, audio_tagging_loss=0.01169, over 4832629.13 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:37:43,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1276000.0, ans=0.0 2023-12-23 18:37:44,683 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.200e-01 2023-12-23 18:37:45,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1276000.0, ans=0.125 2023-12-23 18:38:03,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1276133.3333333333, ans=0.125 2023-12-23 18:38:17,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-12-23 18:38:22,875 INFO [train.py:886] (3/4) Epoch 41, batch 800, loss[loss=0.009218, audio_tagging_loss=0.009218, over 24085.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4862613.02 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:38:32,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1276266.6666666667, ans=0.125 2023-12-23 18:38:43,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1276400.0, ans=0.0 2023-12-23 18:38:51,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.73 vs. limit=22.5 2023-12-23 18:38:54,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1276466.6666666667, ans=0.0 2023-12-23 18:39:01,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-12-23 18:39:08,858 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.674e+01 3.793e+01 3.949e+01 4.603e+01, threshold=7.587e+01, percent-clipped=0.0 2023-12-23 18:39:15,580 INFO [train.py:886] (3/4) Epoch 41, batch 850, loss[loss=0.01236, audio_tagging_loss=0.01236, over 24895.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4881122.26 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:39:19,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1276600.0, ans=0.0 2023-12-23 18:39:21,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-12-23 18:39:23,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-12-23 18:39:28,920 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.045e-03 2023-12-23 18:40:07,465 INFO [train.py:886] (3/4) Epoch 41, batch 900, loss[loss=0.01297, audio_tagging_loss=0.01297, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4897457.50 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:40:08,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1276933.3333333333, ans=0.125 2023-12-23 18:40:12,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1276933.3333333333, ans=0.2 2023-12-23 18:40:15,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1276933.3333333333, ans=0.125 2023-12-23 18:40:19,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=1277000.0, ans=0.1 2023-12-23 18:40:25,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1277000.0, ans=0.2 2023-12-23 18:40:37,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1277133.3333333333, ans=0.0 2023-12-23 18:40:44,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-12-23 18:40:52,161 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.699e+01 3.943e+01 4.115e+01 4.708e+01, threshold=7.886e+01, percent-clipped=0.0 2023-12-23 18:40:52,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1277200.0, ans=0.125 2023-12-23 18:40:59,577 INFO [train.py:886] (3/4) Epoch 41, batch 950, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4907207.83 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:41:15,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1277333.3333333333, ans=0.05 2023-12-23 18:41:17,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-12-23 18:41:28,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1277400.0, ans=0.125 2023-12-23 18:41:34,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1277466.6666666667, ans=0.125 2023-12-23 18:41:35,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1277466.6666666667, ans=0.125 2023-12-23 18:41:40,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1277533.3333333333, ans=0.0 2023-12-23 18:41:41,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1277533.3333333333, ans=0.0 2023-12-23 18:41:46,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1277533.3333333333, ans=0.0 2023-12-23 18:41:47,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1277533.3333333333, ans=0.125 2023-12-23 18:41:52,143 INFO [train.py:886] (3/4) Epoch 41, batch 1000, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4903980.90 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:41:59,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-12-23 18:42:04,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1277666.6666666667, ans=0.1 2023-12-23 18:42:04,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-12-23 18:42:35,892 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.182e+01 3.700e+01 3.857e+01 4.081e+01 5.233e+01, threshold=7.714e+01, percent-clipped=0.0 2023-12-23 18:42:43,288 INFO [train.py:886] (3/4) Epoch 41, batch 1050, loss[loss=0.009166, audio_tagging_loss=0.009166, over 25000.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4913784.00 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:42:45,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1277933.3333333333, ans=0.025 2023-12-23 18:43:03,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1278066.6666666667, ans=0.5 2023-12-23 18:43:10,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.80 vs. limit=22.5 2023-12-23 18:43:35,401 INFO [train.py:886] (3/4) Epoch 41, batch 1100, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4923180.27 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:43:36,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1278266.6666666667, ans=0.0 2023-12-23 18:44:01,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1278400.0, ans=0.125 2023-12-23 18:44:01,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1278400.0, ans=0.125 2023-12-23 18:44:14,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-12-23 18:44:19,474 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.288e+01 3.657e+01 3.787e+01 4.008e+01 4.824e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 18:44:26,098 INFO [train.py:886] (3/4) Epoch 41, batch 1150, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4935294.72 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:44:29,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1278600.0, ans=0.125 2023-12-23 18:44:31,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1278600.0, ans=0.125 2023-12-23 18:44:31,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1278600.0, ans=0.125 2023-12-23 18:44:32,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1278600.0, ans=0.0 2023-12-23 18:44:35,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1278600.0, ans=0.2 2023-12-23 18:44:42,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2023-12-23 18:44:48,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.43 vs. limit=22.5 2023-12-23 18:45:01,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1278800.0, ans=0.125 2023-12-23 18:45:05,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1278800.0, ans=0.125 2023-12-23 18:45:18,178 INFO [train.py:886] (3/4) Epoch 41, batch 1200, loss[loss=0.0122, audio_tagging_loss=0.0122, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4947427.21 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:45:53,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1279133.3333333333, ans=0.2 2023-12-23 18:46:02,170 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.367e+01 3.706e+01 3.872e+01 4.011e+01 4.696e+01, threshold=7.743e+01, percent-clipped=0.0 2023-12-23 18:46:08,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2023-12-23 18:46:09,429 INFO [train.py:886] (3/4) Epoch 41, batch 1250, loss[loss=0.009772, audio_tagging_loss=0.009772, over 25000.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4949327.15 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:46:12,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1279266.6666666667, ans=0.2 2023-12-23 18:46:15,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2023-12-23 18:46:41,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1279466.6666666667, ans=0.125 2023-12-23 18:46:47,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-12-23 18:46:47,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1279466.6666666667, ans=0.0 2023-12-23 18:47:01,525 INFO [train.py:886] (3/4) Epoch 41, batch 1300, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4947949.63 frames. ], batch size: 99, lr: 2.63e-03, grad_scale: 32.0 2023-12-23 18:47:02,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-12-23 18:47:04,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1279600.0, ans=0.2 2023-12-23 18:47:45,475 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.706e+01 3.903e+01 4.052e+01 5.836e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 18:47:52,841 INFO [train.py:886] (3/4) Epoch 41, batch 1350, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24059.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4947811.54 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:07,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-12-23 18:48:15,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1280066.6666666667, ans=0.09899494936611666 2023-12-23 18:48:35,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1280200.0, ans=0.035 2023-12-23 18:48:38,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1280200.0, ans=0.125 2023-12-23 18:48:39,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-12-23 18:48:44,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1280266.6666666667, ans=0.04949747468305833 2023-12-23 18:48:44,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1280266.6666666667, ans=0.1 2023-12-23 18:48:45,526 INFO [train.py:886] (3/4) Epoch 41, batch 1400, loss[loss=0.01231, audio_tagging_loss=0.01231, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4952237.06 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:48:49,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1280266.6666666667, ans=0.125 2023-12-23 18:49:00,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-12-23 18:49:11,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1280400.0, ans=0.125 2023-12-23 18:49:30,413 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.274e+01 3.669e+01 3.890e+01 4.024e+01 4.902e+01, threshold=7.779e+01, percent-clipped=0.0 2023-12-23 18:49:37,003 INFO [train.py:886] (3/4) Epoch 41, batch 1450, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4953727.27 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:49:44,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1280600.0, ans=0.125 2023-12-23 18:49:55,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2023-12-23 18:50:17,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2023-12-23 18:50:21,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-12-23 18:50:28,723 INFO [train.py:886] (3/4) Epoch 41, batch 1500, loss[loss=0.009317, audio_tagging_loss=0.009317, over 25000.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4960844.02 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:50:43,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281000.0, ans=0.1 2023-12-23 18:50:43,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-12-23 18:50:49,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-23 18:50:54,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-12-23 18:50:57,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1281066.6666666667, ans=0.125 2023-12-23 18:51:02,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1281133.3333333333, ans=6.0 2023-12-23 18:51:10,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1281200.0, ans=0.2 2023-12-23 18:51:13,185 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.676e+01 3.888e+01 4.062e+01 4.485e+01, threshold=7.775e+01, percent-clipped=0.0 2023-12-23 18:51:20,539 INFO [train.py:886] (3/4) Epoch 41, batch 1550, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4959128.47 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:51:21,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1281266.6666666667, ans=0.0 2023-12-23 18:51:55,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.84 vs. limit=22.5 2023-12-23 18:52:07,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1281533.3333333333, ans=0.5 2023-12-23 18:52:12,604 INFO [train.py:886] (3/4) Epoch 41, batch 1600, loss[loss=0.01033, audio_tagging_loss=0.01033, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4958045.89 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:52:18,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1281600.0, ans=6.0 2023-12-23 18:52:20,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1281600.0, ans=0.1 2023-12-23 18:52:42,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1281800.0, ans=0.1 2023-12-23 18:52:47,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1281800.0, ans=0.125 2023-12-23 18:52:56,608 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.686e+01 3.838e+01 4.087e+01 6.862e+01, threshold=7.676e+01, percent-clipped=0.0 2023-12-23 18:53:03,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1281933.3333333333, ans=0.2 2023-12-23 18:53:03,952 INFO [train.py:886] (3/4) Epoch 41, batch 1650, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4955306.04 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:53:11,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1281933.3333333333, ans=0.125 2023-12-23 18:53:29,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1282066.6666666667, ans=0.125 2023-12-23 18:53:42,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1282133.3333333333, ans=0.125 2023-12-23 18:53:46,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1282200.0, ans=0.0 2023-12-23 18:53:52,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1282200.0, ans=0.025 2023-12-23 18:53:56,367 INFO [train.py:886] (3/4) Epoch 41, batch 1700, loss[loss=0.01347, audio_tagging_loss=0.01347, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4954252.03 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:54:27,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-12-23 18:54:31,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1282466.6666666667, ans=0.2 2023-12-23 18:54:35,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-23 18:54:40,556 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.322e+01 3.662e+01 3.806e+01 4.000e+01 4.786e+01, threshold=7.612e+01, percent-clipped=0.0 2023-12-23 18:54:48,067 INFO [train.py:886] (3/4) Epoch 41, batch 1750, loss[loss=0.01305, audio_tagging_loss=0.01305, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4955978.43 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:54:48,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1282600.0, ans=0.125 2023-12-23 18:55:28,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1282866.6666666667, ans=0.125 2023-12-23 18:55:39,840 INFO [train.py:886] (3/4) Epoch 41, batch 1800, loss[loss=0.01159, audio_tagging_loss=0.01159, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4956561.18 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:55:56,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1283000.0, ans=0.125 2023-12-23 18:56:02,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1283066.6666666667, ans=0.125 2023-12-23 18:56:05,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1283066.6666666667, ans=0.125 2023-12-23 18:56:12,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1283133.3333333333, ans=0.5 2023-12-23 18:56:17,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1283133.3333333333, ans=0.0 2023-12-23 18:56:23,835 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.191e+01 3.787e+01 3.915e+01 4.054e+01 5.277e+01, threshold=7.830e+01, percent-clipped=0.0 2023-12-23 18:56:31,236 INFO [train.py:886] (3/4) Epoch 41, batch 1850, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4954896.68 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:56:36,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1283266.6666666667, ans=0.1 2023-12-23 18:56:39,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1283266.6666666667, ans=0.2 2023-12-23 18:56:52,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1283400.0, ans=0.125 2023-12-23 18:57:22,452 INFO [train.py:886] (3/4) Epoch 41, batch 1900, loss[loss=0.007756, audio_tagging_loss=0.007756, over 22543.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4948012.86 frames. ], batch size: 107, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:57:26,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1283600.0, ans=0.125 2023-12-23 18:58:00,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1283800.0, ans=0.125 2023-12-23 18:58:06,575 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.762e+01 3.907e+01 4.041e+01 4.562e+01, threshold=7.814e+01, percent-clipped=0.0 2023-12-23 18:58:13,901 INFO [train.py:886] (3/4) Epoch 41, batch 1950, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4945696.89 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:58:17,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1283933.3333333333, ans=0.125 2023-12-23 18:58:19,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2023-12-23 18:58:25,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1284000.0, ans=0.125 2023-12-23 18:58:34,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2023-12-23 18:58:40,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1284066.6666666667, ans=0.1 2023-12-23 18:58:48,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1284133.3333333333, ans=0.125 2023-12-23 18:58:49,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1284133.3333333333, ans=0.0 2023-12-23 18:59:06,061 INFO [train.py:886] (3/4) Epoch 41, batch 2000, loss[loss=0.008539, audio_tagging_loss=0.008539, over 23998.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4947107.64 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 18:59:32,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1284400.0, ans=10.0 2023-12-23 18:59:36,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1284466.6666666667, ans=0.125 2023-12-23 18:59:50,425 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.662e+01 3.860e+01 4.077e+01 4.836e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 18:59:51,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1284533.3333333333, ans=0.2 2023-12-23 18:59:52,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1284533.3333333333, ans=0.0 2023-12-23 18:59:57,749 INFO [train.py:886] (3/4) Epoch 41, batch 2050, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4951695.04 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:00:27,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1284733.3333333333, ans=0.125 2023-12-23 19:00:36,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1284800.0, ans=0.04949747468305833 2023-12-23 19:00:41,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-12-23 19:00:45,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1284866.6666666667, ans=0.125 2023-12-23 19:00:47,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1284866.6666666667, ans=0.125 2023-12-23 19:00:49,180 INFO [train.py:886] (3/4) Epoch 41, batch 2100, loss[loss=0.01037, audio_tagging_loss=0.01037, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4955211.42 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:00:52,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-12-23 19:01:07,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1285000.0, ans=0.1 2023-12-23 19:01:14,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1285066.6666666667, ans=0.2 2023-12-23 19:01:20,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1285133.3333333333, ans=0.0 2023-12-23 19:01:34,050 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.671e+01 3.827e+01 4.014e+01 4.652e+01, threshold=7.654e+01, percent-clipped=0.0 2023-12-23 19:01:41,371 INFO [train.py:886] (3/4) Epoch 41, batch 2150, loss[loss=0.01182, audio_tagging_loss=0.01182, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4961506.44 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 64.0 2023-12-23 19:01:51,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1285333.3333333333, ans=0.125 2023-12-23 19:01:59,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1285333.3333333333, ans=0.0 2023-12-23 19:02:18,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1285466.6666666667, ans=10.0 2023-12-23 19:02:29,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-12-23 19:02:33,075 INFO [train.py:886] (3/4) Epoch 41, batch 2200, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4960126.46 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:02:35,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2023-12-23 19:02:56,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285733.3333333333, ans=0.1 2023-12-23 19:03:00,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285733.3333333333, ans=0.1 2023-12-23 19:03:13,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-12-23 19:03:17,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1285866.6666666667, ans=0.1 2023-12-23 19:03:18,840 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.225e+01 3.764e+01 3.888e+01 4.003e+01 5.031e+01, threshold=7.777e+01, percent-clipped=0.0 2023-12-23 19:03:20,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1285866.6666666667, ans=0.1 2023-12-23 19:03:25,261 INFO [train.py:886] (3/4) Epoch 41, batch 2250, loss[loss=0.01218, audio_tagging_loss=0.01218, over 24939.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4948306.43 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:03:31,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1285933.3333333333, ans=0.1 2023-12-23 19:03:33,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-12-23 19:03:33,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2023-12-23 19:03:44,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-12-23 19:03:52,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1286066.6666666667, ans=0.125 2023-12-23 19:04:16,985 INFO [train.py:886] (3/4) Epoch 41, batch 2300, loss[loss=0.009858, audio_tagging_loss=0.009858, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4947885.12 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:04:18,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1286266.6666666667, ans=0.125 2023-12-23 19:04:24,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-12-23 19:04:30,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1286333.3333333333, ans=0.125 2023-12-23 19:04:35,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1286333.3333333333, ans=0.0 2023-12-23 19:04:39,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1286400.0, ans=0.125 2023-12-23 19:04:40,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1286400.0, ans=0.125 2023-12-23 19:04:49,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1286466.6666666667, ans=0.1 2023-12-23 19:04:59,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1286533.3333333333, ans=0.125 2023-12-23 19:05:02,017 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.305e+01 3.676e+01 3.827e+01 3.947e+01 4.404e+01, threshold=7.653e+01, percent-clipped=0.0 2023-12-23 19:05:08,341 INFO [train.py:886] (3/4) Epoch 41, batch 2350, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4953430.20 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:05:29,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2023-12-23 19:05:36,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1286733.3333333333, ans=0.0 2023-12-23 19:05:37,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.72 vs. limit=22.5 2023-12-23 19:05:49,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1286866.6666666667, ans=0.125 2023-12-23 19:05:55,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-12-23 19:05:56,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1286866.6666666667, ans=0.125 2023-12-23 19:06:00,412 INFO [train.py:886] (3/4) Epoch 41, batch 2400, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4955652.94 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:06:05,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1286933.3333333333, ans=0.0 2023-12-23 19:06:17,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-12-23 19:06:43,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1287200.0, ans=0.0 2023-12-23 19:06:46,642 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.257e+01 3.616e+01 3.787e+01 3.992e+01 4.640e+01, threshold=7.573e+01, percent-clipped=0.0 2023-12-23 19:06:50,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2023-12-23 19:06:52,489 INFO [train.py:886] (3/4) Epoch 41, batch 2450, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4955459.13 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:07:08,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1287333.3333333333, ans=0.1 2023-12-23 19:07:19,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1287400.0, ans=0.2 2023-12-23 19:07:40,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1287533.3333333333, ans=0.2 2023-12-23 19:07:41,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1287533.3333333333, ans=0.0 2023-12-23 19:07:44,509 INFO [train.py:886] (3/4) Epoch 41, batch 2500, loss[loss=0.01089, audio_tagging_loss=0.01089, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4954991.18 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:07:49,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-12-23 19:08:23,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1287800.0, ans=0.125 2023-12-23 19:08:28,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1287866.6666666667, ans=0.0 2023-12-23 19:08:29,631 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.727e+01 3.879e+01 4.092e+01 4.648e+01, threshold=7.757e+01, percent-clipped=0.0 2023-12-23 19:08:36,079 INFO [train.py:886] (3/4) Epoch 41, batch 2550, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01173, audio_tagging_loss=0.01173, over 4954373.56 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:08:36,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2023-12-23 19:08:55,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1288000.0, ans=0.0 2023-12-23 19:09:03,667 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:09:06,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1288133.3333333333, ans=0.0 2023-12-23 19:09:19,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1288200.0, ans=0.0 2023-12-23 19:09:19,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2023-12-23 19:09:28,072 INFO [train.py:886] (3/4) Epoch 41, batch 2600, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4949471.39 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:09:32,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=12.0 2023-12-23 19:09:33,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1288266.6666666667, ans=0.125 2023-12-23 19:09:34,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1288266.6666666667, ans=0.125 2023-12-23 19:09:59,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-23 19:10:13,061 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.737e+01 3.877e+01 4.049e+01 5.026e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:10:20,184 INFO [train.py:886] (3/4) Epoch 41, batch 2650, loss[loss=0.009748, audio_tagging_loss=0.009748, over 24002.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4949142.93 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:10:23,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1288600.0, ans=0.125 2023-12-23 19:10:53,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1288800.0, ans=0.125 2023-12-23 19:11:02,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1288866.6666666667, ans=0.2 2023-12-23 19:11:09,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1288866.6666666667, ans=0.2 2023-12-23 19:11:10,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1288933.3333333333, ans=0.05 2023-12-23 19:11:11,314 INFO [train.py:886] (3/4) Epoch 41, batch 2700, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4952285.05 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:11:14,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-12-23 19:11:17,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-12-23 19:11:17,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1288933.3333333333, ans=0.125 2023-12-23 19:11:30,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1289000.0, ans=0.0 2023-12-23 19:11:40,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1289066.6666666667, ans=0.125 2023-12-23 19:11:56,738 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.268e+01 3.652e+01 3.827e+01 3.975e+01 4.292e+01, threshold=7.655e+01, percent-clipped=0.0 2023-12-23 19:12:03,183 INFO [train.py:886] (3/4) Epoch 41, batch 2750, loss[loss=0.0125, audio_tagging_loss=0.0125, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4955051.45 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:12:15,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1289333.3333333333, ans=0.125 2023-12-23 19:12:15,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1289333.3333333333, ans=0.1 2023-12-23 19:12:32,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-12-23 19:12:36,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-12-23 19:12:46,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1289533.3333333333, ans=0.125 2023-12-23 19:12:55,081 INFO [train.py:886] (3/4) Epoch 41, batch 2800, loss[loss=0.01225, audio_tagging_loss=0.01225, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4955615.56 frames. ], batch size: 99, lr: 2.62e-03, grad_scale: 32.0 2023-12-23 19:13:08,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1289666.6666666667, ans=0.1 2023-12-23 19:13:11,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-12-23 19:13:14,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-12-23 19:13:20,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1289733.3333333333, ans=0.125 2023-12-23 19:13:35,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1289800.0, ans=0.2 2023-12-23 19:13:41,850 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.762e+01 3.896e+01 4.063e+01 4.589e+01, threshold=7.791e+01, percent-clipped=0.0 2023-12-23 19:13:44,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-12-23 19:13:47,610 INFO [train.py:886] (3/4) Epoch 41, batch 2850, loss[loss=0.01241, audio_tagging_loss=0.01241, over 25000.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4950795.88 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:13:49,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-12-23 19:13:56,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2023-12-23 19:13:58,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=1290000.0, ans=22.5 2023-12-23 19:13:59,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-12-23 19:14:31,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-12-23 19:14:35,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1290200.0, ans=0.0 2023-12-23 19:14:39,147 INFO [train.py:886] (3/4) Epoch 41, batch 2900, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4948960.01 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:14:46,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-23 19:14:48,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1290266.6666666667, ans=0.0 2023-12-23 19:14:52,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1290333.3333333333, ans=0.125 2023-12-23 19:14:55,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1290333.3333333333, ans=0.025 2023-12-23 19:15:03,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=12.0 2023-12-23 19:15:20,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-12-23 19:15:24,931 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.234e+01 3.667e+01 3.837e+01 4.047e+01 4.824e+01, threshold=7.673e+01, percent-clipped=0.0 2023-12-23 19:15:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1290533.3333333333, ans=0.125 2023-12-23 19:15:28,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1290533.3333333333, ans=0.04949747468305833 2023-12-23 19:15:31,330 INFO [train.py:886] (3/4) Epoch 41, batch 2950, loss[loss=0.008637, audio_tagging_loss=0.008637, over 24052.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4948413.14 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:15:41,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1290666.6666666667, ans=0.125 2023-12-23 19:15:57,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1290733.3333333333, ans=0.125 2023-12-23 19:16:03,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-12-23 19:16:16,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1290866.6666666667, ans=0.0 2023-12-23 19:16:23,711 INFO [train.py:886] (3/4) Epoch 41, batch 3000, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4952082.06 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:16:23,711 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 19:16:33,716 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7529, 5.9083, 5.2772, 5.6069], device='cuda:3') 2023-12-23 19:16:44,997 INFO [train.py:917] (3/4) Epoch 41, validation: loss=0.03524, audio_tagging_loss=0.03524, over 3737520.00 frames. 2023-12-23 19:16:44,998 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 19:16:45,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1290933.3333333333, ans=0.125 2023-12-23 19:17:08,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1291066.6666666667, ans=0.0 2023-12-23 19:17:16,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=12.0 2023-12-23 19:17:20,746 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.271e-02 2023-12-23 19:17:22,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1291133.3333333333, ans=0.0 2023-12-23 19:17:30,599 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.633e+01 3.842e+01 3.990e+01 4.593e+01, threshold=7.683e+01, percent-clipped=0.0 2023-12-23 19:17:37,025 INFO [train.py:886] (3/4) Epoch 41, batch 3050, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4953728.94 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:17:37,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1291266.6666666667, ans=0.125 2023-12-23 19:17:44,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-12-23 19:17:44,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-12-23 19:17:59,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1291400.0, ans=0.0 2023-12-23 19:18:07,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1291466.6666666667, ans=0.0 2023-12-23 19:18:15,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1291466.6666666667, ans=0.125 2023-12-23 19:18:28,507 INFO [train.py:886] (3/4) Epoch 41, batch 3100, loss[loss=0.01317, audio_tagging_loss=0.01317, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4955358.22 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:18:29,638 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:18:41,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-12-23 19:18:50,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-12-23 19:18:57,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1291733.3333333333, ans=0.2 2023-12-23 19:19:14,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.751e+01 3.878e+01 4.025e+01 4.905e+01, threshold=7.756e+01, percent-clipped=0.0 2023-12-23 19:19:14,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1291866.6666666667, ans=0.1 2023-12-23 19:19:19,790 INFO [train.py:886] (3/4) Epoch 41, batch 3150, loss[loss=0.00952, audio_tagging_loss=0.00952, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4953720.47 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:19:37,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1292000.0, ans=0.0 2023-12-23 19:19:46,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-12-23 19:19:48,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1292066.6666666667, ans=0.0 2023-12-23 19:20:05,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1292200.0, ans=0.0 2023-12-23 19:20:12,135 INFO [train.py:886] (3/4) Epoch 41, batch 3200, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01146, audio_tagging_loss=0.01146, over 4949314.76 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:20:30,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1292333.3333333333, ans=0.125 2023-12-23 19:20:33,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1292400.0, ans=0.125 2023-12-23 19:20:35,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1292400.0, ans=0.125 2023-12-23 19:20:45,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1292466.6666666667, ans=0.125 2023-12-23 19:20:50,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1292466.6666666667, ans=0.05 2023-12-23 19:20:50,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1292466.6666666667, ans=0.0 2023-12-23 19:20:57,347 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.736e+01 3.877e+01 4.151e+01 5.106e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 19:21:00,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1292533.3333333333, ans=0.0 2023-12-23 19:21:04,471 INFO [train.py:886] (3/4) Epoch 41, batch 3250, loss[loss=0.01095, audio_tagging_loss=0.01095, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4948440.35 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:21:04,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2023-12-23 19:21:34,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1292800.0, ans=0.95 2023-12-23 19:21:37,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2023-12-23 19:21:41,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2023-12-23 19:21:51,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1292866.6666666667, ans=0.0 2023-12-23 19:21:51,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1292866.6666666667, ans=0.125 2023-12-23 19:21:56,100 INFO [train.py:886] (3/4) Epoch 41, batch 3300, loss[loss=0.009237, audio_tagging_loss=0.009237, over 25000.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4943629.16 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:22:07,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1293000.0, ans=0.125 2023-12-23 19:22:08,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1293000.0, ans=0.0 2023-12-23 19:22:19,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1293066.6666666667, ans=0.125 2023-12-23 19:22:29,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1293133.3333333333, ans=0.2 2023-12-23 19:22:39,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1293200.0, ans=0.0 2023-12-23 19:22:41,790 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.319e+01 3.618e+01 3.799e+01 4.010e+01 4.611e+01, threshold=7.598e+01, percent-clipped=0.0 2023-12-23 19:22:42,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1293200.0, ans=0.125 2023-12-23 19:22:45,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1293200.0, ans=0.2 2023-12-23 19:22:47,451 INFO [train.py:886] (3/4) Epoch 41, batch 3350, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4950780.70 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:22:55,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1293266.6666666667, ans=0.0 2023-12-23 19:22:56,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1293333.3333333333, ans=0.125 2023-12-23 19:22:59,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1293333.3333333333, ans=0.2 2023-12-23 19:23:24,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-12-23 19:23:31,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.40 vs. limit=22.5 2023-12-23 19:23:39,111 INFO [train.py:886] (3/4) Epoch 41, batch 3400, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4951838.23 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:23:42,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1293600.0, ans=0.0 2023-12-23 19:23:50,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1293666.6666666667, ans=0.0 2023-12-23 19:24:02,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1293733.3333333333, ans=0.5 2023-12-23 19:24:05,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1293733.3333333333, ans=0.125 2023-12-23 19:24:05,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1293733.3333333333, ans=0.2 2023-12-23 19:24:08,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1293733.3333333333, ans=0.5 2023-12-23 19:24:20,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1293866.6666666667, ans=0.07 2023-12-23 19:24:21,346 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.038e-01 2023-12-23 19:24:24,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.278e+01 3.727e+01 3.891e+01 4.043e+01 4.833e+01, threshold=7.782e+01, percent-clipped=0.0 2023-12-23 19:24:26,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1293866.6666666667, ans=0.2 2023-12-23 19:24:30,528 INFO [train.py:886] (3/4) Epoch 41, batch 3450, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4938889.05 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:24:45,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1294000.0, ans=0.2 2023-12-23 19:25:09,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1294133.3333333333, ans=0.125 2023-12-23 19:25:23,497 INFO [train.py:886] (3/4) Epoch 41, batch 3500, loss[loss=0.01119, audio_tagging_loss=0.01119, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4936736.73 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:25:37,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1294333.3333333333, ans=0.1 2023-12-23 19:25:43,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1294400.0, ans=0.125 2023-12-23 19:25:43,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1294400.0, ans=0.125 2023-12-23 19:25:55,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1294466.6666666667, ans=0.0 2023-12-23 19:26:07,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.332e+01 3.699e+01 3.860e+01 4.048e+01 4.617e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 19:26:14,140 INFO [train.py:886] (3/4) Epoch 41, batch 3550, loss[loss=0.01145, audio_tagging_loss=0.01145, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4939731.43 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:26:20,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1294600.0, ans=0.0 2023-12-23 19:26:29,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1294666.6666666667, ans=0.125 2023-12-23 19:26:59,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-12-23 19:27:05,727 INFO [train.py:886] (3/4) Epoch 41, batch 3600, loss[loss=0.01158, audio_tagging_loss=0.01158, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4947283.45 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:27:17,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1295000.0, ans=0.1 2023-12-23 19:27:23,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1295000.0, ans=0.0 2023-12-23 19:27:29,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1295066.6666666667, ans=0.2 2023-12-23 19:27:51,208 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.401e+01 3.671e+01 3.840e+01 4.001e+01 4.394e+01, threshold=7.680e+01, percent-clipped=0.0 2023-12-23 19:27:58,369 INFO [train.py:886] (3/4) Epoch 41, batch 3650, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4952804.90 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:28:28,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=1295466.6666666667, ans=0.2 2023-12-23 19:28:30,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1295466.6666666667, ans=0.0 2023-12-23 19:28:39,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1295533.3333333333, ans=0.125 2023-12-23 19:28:42,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1295533.3333333333, ans=0.0 2023-12-23 19:28:47,969 INFO [train.py:886] (3/4) Epoch 41, batch 3700, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4952937.14 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:28:54,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1295600.0, ans=0.125 2023-12-23 19:29:05,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1295666.6666666667, ans=0.035 2023-12-23 19:29:06,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1295666.6666666667, ans=0.125 2023-12-23 19:29:10,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:29:13,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1295733.3333333333, ans=0.2 2023-12-23 19:29:20,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-12-23 19:29:35,183 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.261e+01 3.680e+01 3.875e+01 4.029e+01 4.590e+01, threshold=7.750e+01, percent-clipped=0.0 2023-12-23 19:29:35,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1295866.6666666667, ans=0.0 2023-12-23 19:29:39,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1295866.6666666667, ans=0.125 2023-12-23 19:29:40,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1295933.3333333333, ans=0.125 2023-12-23 19:29:40,933 INFO [train.py:886] (3/4) Epoch 41, batch 3750, loss[loss=0.01065, audio_tagging_loss=0.01065, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4951996.71 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:29:43,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1295933.3333333333, ans=0.0 2023-12-23 19:30:07,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1296066.6666666667, ans=0.04949747468305833 2023-12-23 19:30:13,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1296133.3333333333, ans=0.125 2023-12-23 19:30:28,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1296200.0, ans=0.0 2023-12-23 19:30:30,886 INFO [train.py:886] (3/4) Epoch 41, batch 3800, loss[loss=0.01004, audio_tagging_loss=0.01004, over 23980.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4946935.93 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:30:39,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1296266.6666666667, ans=0.125 2023-12-23 19:30:41,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1296333.3333333333, ans=0.1 2023-12-23 19:30:44,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1296333.3333333333, ans=0.125 2023-12-23 19:30:45,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1296333.3333333333, ans=0.1 2023-12-23 19:30:46,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1296333.3333333333, ans=0.0 2023-12-23 19:31:03,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=22.5 2023-12-23 19:31:13,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-12-23 19:31:17,215 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.684e+01 3.876e+01 4.085e+01 4.684e+01, threshold=7.752e+01, percent-clipped=0.0 2023-12-23 19:31:23,007 INFO [train.py:886] (3/4) Epoch 41, batch 3850, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4943159.04 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:31:46,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1296733.3333333333, ans=0.0 2023-12-23 19:32:02,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1296800.0, ans=0.125 2023-12-23 19:32:09,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1296866.6666666667, ans=0.125 2023-12-23 19:32:16,079 INFO [train.py:886] (3/4) Epoch 41, batch 3900, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4940778.18 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:32:20,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1296933.3333333333, ans=0.125 2023-12-23 19:32:28,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1297000.0, ans=0.1 2023-12-23 19:32:35,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-12-23 19:32:46,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-23 19:32:52,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1297133.3333333333, ans=0.125 2023-12-23 19:32:57,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1297200.0, ans=0.125 2023-12-23 19:32:58,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1297200.0, ans=0.125 2023-12-23 19:33:00,595 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.712e+01 3.871e+01 3.981e+01 4.576e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 19:33:04,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1297200.0, ans=0.0 2023-12-23 19:33:07,018 INFO [train.py:886] (3/4) Epoch 41, batch 3950, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4947019.30 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:33:23,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1297333.3333333333, ans=0.125 2023-12-23 19:33:31,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1297400.0, ans=0.0 2023-12-23 19:33:59,514 INFO [train.py:886] (3/4) Epoch 41, batch 4000, loss[loss=0.0112, audio_tagging_loss=0.0112, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4949382.24 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:33:59,720 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:34:15,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1297666.6666666667, ans=0.0 2023-12-23 19:34:34,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1297800.0, ans=0.0 2023-12-23 19:34:40,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-12-23 19:34:40,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1297866.6666666667, ans=0.0 2023-12-23 19:34:40,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1297866.6666666667, ans=0.0 2023-12-23 19:34:44,943 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.739e+01 3.854e+01 4.037e+01 4.729e+01, threshold=7.708e+01, percent-clipped=0.0 2023-12-23 19:34:51,360 INFO [train.py:886] (3/4) Epoch 41, batch 4050, loss[loss=0.01168, audio_tagging_loss=0.01168, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4950744.16 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:35:04,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2023-12-23 19:35:07,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1298000.0, ans=0.1 2023-12-23 19:35:25,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1298133.3333333333, ans=0.125 2023-12-23 19:35:26,591 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:35:30,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1298133.3333333333, ans=0.125 2023-12-23 19:35:30,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1298133.3333333333, ans=0.125 2023-12-23 19:35:43,283 INFO [train.py:886] (3/4) Epoch 41, batch 4100, loss[loss=0.009567, audio_tagging_loss=0.009567, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4947435.86 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:35:44,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1298266.6666666667, ans=0.0 2023-12-23 19:35:47,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1298266.6666666667, ans=0.2 2023-12-23 19:35:55,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1298333.3333333333, ans=0.125 2023-12-23 19:35:57,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1298333.3333333333, ans=0.0 2023-12-23 19:36:13,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1298466.6666666667, ans=0.0 2023-12-23 19:36:24,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1298533.3333333333, ans=0.2 2023-12-23 19:36:26,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1298533.3333333333, ans=10.0 2023-12-23 19:36:29,709 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.286e+01 3.678e+01 3.896e+01 4.080e+01 4.675e+01, threshold=7.792e+01, percent-clipped=0.0 2023-12-23 19:36:30,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1298533.3333333333, ans=0.125 2023-12-23 19:36:35,457 INFO [train.py:886] (3/4) Epoch 41, batch 4150, loss[loss=0.009428, audio_tagging_loss=0.009428, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4944287.12 frames. ], batch size: 99, lr: 2.61e-03, grad_scale: 32.0 2023-12-23 19:36:46,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-23 19:37:03,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1298733.3333333333, ans=0.125 2023-12-23 19:37:27,141 INFO [train.py:886] (3/4) Epoch 41, batch 4200, loss[loss=0.01371, audio_tagging_loss=0.01371, over 25000.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4944799.61 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:37:28,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1298933.3333333333, ans=0.125 2023-12-23 19:37:50,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1299066.6666666667, ans=0.0 2023-12-23 19:38:06,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1299133.3333333333, ans=0.125 2023-12-23 19:38:12,737 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.759e+01 3.866e+01 4.012e+01 4.707e+01, threshold=7.732e+01, percent-clipped=0.0 2023-12-23 19:38:19,213 INFO [train.py:886] (3/4) Epoch 41, batch 4250, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4946668.53 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 64.0 2023-12-23 19:38:22,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1299266.6666666667, ans=0.0 2023-12-23 19:38:25,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-12-23 19:38:41,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1299400.0, ans=0.0 2023-12-23 19:38:45,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1299400.0, ans=0.2 2023-12-23 19:38:46,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-12-23 19:38:46,788 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:38:56,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2023-12-23 19:39:03,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-12-23 19:39:07,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1299533.3333333333, ans=0.1 2023-12-23 19:39:08,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1299533.3333333333, ans=0.125 2023-12-23 19:39:11,492 INFO [train.py:886] (3/4) Epoch 41, batch 4300, loss[loss=0.01114, audio_tagging_loss=0.01114, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4956960.77 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:39:21,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2023-12-23 19:39:22,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2023-12-23 19:39:45,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1299800.0, ans=0.125 2023-12-23 19:39:52,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1299866.6666666667, ans=0.0 2023-12-23 19:39:55,936 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.398e+01 3.671e+01 3.818e+01 3.950e+01 4.671e+01, threshold=7.635e+01, percent-clipped=0.0 2023-12-23 19:40:02,286 INFO [train.py:886] (3/4) Epoch 41, batch 4350, loss[loss=0.01183, audio_tagging_loss=0.01183, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4956377.04 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:40:17,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1300000.0, ans=0.125 2023-12-23 19:40:19,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1300000.0, ans=0.05 2023-12-23 19:40:29,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1300066.6666666667, ans=0.1 2023-12-23 19:40:37,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1300133.3333333333, ans=0.125 2023-12-23 19:40:42,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1300133.3333333333, ans=0.1 2023-12-23 19:40:44,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1300200.0, ans=0.125 2023-12-23 19:40:47,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1300200.0, ans=0.125 2023-12-23 19:40:48,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-23 19:40:52,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1300200.0, ans=0.1 2023-12-23 19:40:54,471 INFO [train.py:886] (3/4) Epoch 41, batch 4400, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.01141, audio_tagging_loss=0.01141, over 4950359.28 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:41:06,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1300333.3333333333, ans=0.2 2023-12-23 19:41:15,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1300400.0, ans=0.07 2023-12-23 19:41:39,976 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.419e+01 3.802e+01 3.969e+01 4.169e+01 5.691e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 19:41:45,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1300600.0, ans=0.125 2023-12-23 19:41:46,399 INFO [train.py:886] (3/4) Epoch 41, batch 4450, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4943371.57 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:41:57,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1300666.6666666667, ans=0.125 2023-12-23 19:42:03,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2023-12-23 19:42:08,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:42:18,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1300800.0, ans=0.125 2023-12-23 19:42:22,406 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:42:28,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1300866.6666666667, ans=0.0 2023-12-23 19:42:37,332 INFO [train.py:886] (3/4) Epoch 41, batch 4500, loss[loss=0.01498, audio_tagging_loss=0.01498, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4948701.51 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:42:43,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1300933.3333333333, ans=0.0 2023-12-23 19:42:52,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1301000.0, ans=0.1 2023-12-23 19:43:07,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1301066.6666666667, ans=0.0 2023-12-23 19:43:12,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-12-23 19:43:24,736 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.397e+01 3.696e+01 3.844e+01 4.118e+01 4.848e+01, threshold=7.689e+01, percent-clipped=0.0 2023-12-23 19:43:29,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.93 vs. limit=22.5 2023-12-23 19:43:30,445 INFO [train.py:886] (3/4) Epoch 41, batch 4550, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4951799.23 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:43:43,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1301333.3333333333, ans=0.125 2023-12-23 19:43:44,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-12-23 19:43:51,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1301400.0, ans=0.125 2023-12-23 19:43:55,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1301400.0, ans=0.125 2023-12-23 19:44:10,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1301533.3333333333, ans=0.125 2023-12-23 19:44:21,571 INFO [train.py:886] (3/4) Epoch 41, batch 4600, loss[loss=0.009604, audio_tagging_loss=0.009604, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4947880.35 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 64.0 2023-12-23 19:44:31,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-12-23 19:44:40,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1301666.6666666667, ans=0.0 2023-12-23 19:44:57,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1301800.0, ans=0.1 2023-12-23 19:44:57,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1301800.0, ans=0.125 2023-12-23 19:45:08,424 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.753e+01 3.899e+01 4.048e+01 4.729e+01, threshold=7.798e+01, percent-clipped=0.0 2023-12-23 19:45:13,159 INFO [train.py:886] (3/4) Epoch 41, batch 4650, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4948103.58 frames. ], batch size: 100, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:45:20,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1301933.3333333333, ans=0.2 2023-12-23 19:45:22,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1302000.0, ans=0.125 2023-12-23 19:45:23,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-12-23 19:45:45,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-12-23 19:45:50,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1302133.3333333333, ans=0.125 2023-12-23 19:45:52,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2023-12-23 19:45:53,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1302200.0, ans=0.125 2023-12-23 19:46:03,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=12.0 2023-12-23 19:46:03,509 INFO [train.py:886] (3/4) Epoch 41, batch 4700, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4949351.27 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:04,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1302266.6666666667, ans=0.125 2023-12-23 19:46:15,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1302333.3333333333, ans=0.125 2023-12-23 19:46:17,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1302333.3333333333, ans=10.0 2023-12-23 19:46:22,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1302400.0, ans=0.125 2023-12-23 19:46:24,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1302400.0, ans=0.125 2023-12-23 19:46:37,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1302466.6666666667, ans=0.1 2023-12-23 19:46:40,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1302533.3333333333, ans=0.0 2023-12-23 19:46:41,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1302533.3333333333, ans=0.2 2023-12-23 19:46:42,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1302533.3333333333, ans=0.125 2023-12-23 19:46:46,302 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.793e+01 3.987e+01 4.159e+01 5.000e+01, threshold=7.973e+01, percent-clipped=0.0 2023-12-23 19:46:47,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2023-12-23 19:46:50,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2023-12-23 19:46:50,806 INFO [train.py:886] (3/4) Epoch 41, batch 4750, loss[loss=0.01129, audio_tagging_loss=0.01129, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4944774.37 frames. ], batch size: 99, lr: 2.60e-03, grad_scale: 32.0 2023-12-23 19:46:53,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1302600.0, ans=0.0 2023-12-23 19:46:56,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1302600.0, ans=0.0 2023-12-23 19:47:00,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1302666.6666666667, ans=0.0 2023-12-23 19:47:24,816 INFO [train.py:886] (3/4) Epoch 42, batch 0, loss[loss=0.02712, audio_tagging_loss=0.02712, over 20902.00 frames. ], tot_loss[loss=0.02712, audio_tagging_loss=0.02712, over 20902.00 frames. ], batch size: 107, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:47:24,816 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 19:47:45,224 INFO [train.py:917] (3/4) Epoch 42, validation: loss=0.03462, audio_tagging_loss=0.03462, over 3737520.00 frames. 2023-12-23 19:47:45,224 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 19:47:55,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1302773.3333333333, ans=0.0 2023-12-23 19:47:58,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1302773.3333333333, ans=0.1 2023-12-23 19:47:58,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1302773.3333333333, ans=0.1 2023-12-23 19:47:59,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1302773.3333333333, ans=0.125 2023-12-23 19:48:05,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1302840.0, ans=0.125 2023-12-23 19:48:06,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1302840.0, ans=0.125 2023-12-23 19:48:12,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1302840.0, ans=0.0 2023-12-23 19:48:34,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-12-23 19:48:37,179 INFO [train.py:886] (3/4) Epoch 42, batch 50, loss[loss=0.01637, audio_tagging_loss=0.01637, over 25000.00 frames. ], tot_loss[loss=0.01786, audio_tagging_loss=0.01786, over 1119056.66 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:48:47,907 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:48:54,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1303106.6666666667, ans=0.125 2023-12-23 19:49:08,456 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.767e+01 4.244e+01 4.772e+01 5.544e+01 1.220e+02, threshold=9.545e+01, percent-clipped=3.0 2023-12-23 19:49:09,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-12-23 19:49:11,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1303240.0, ans=0.0 2023-12-23 19:49:19,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-12-23 19:49:28,202 INFO [train.py:886] (3/4) Epoch 42, batch 100, loss[loss=0.01423, audio_tagging_loss=0.01423, over 25000.00 frames. ], tot_loss[loss=0.01565, audio_tagging_loss=0.01565, over 1974518.42 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:49:38,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1303440.0, ans=0.0 2023-12-23 19:50:20,325 INFO [train.py:886] (3/4) Epoch 42, batch 150, loss[loss=0.01535, audio_tagging_loss=0.01535, over 25000.00 frames. ], tot_loss[loss=0.01431, audio_tagging_loss=0.01431, over 2640047.45 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:50:23,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1303706.6666666667, ans=0.0 2023-12-23 19:50:25,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1303706.6666666667, ans=0.0 2023-12-23 19:50:26,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-12-23 19:50:41,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1303840.0, ans=0.2 2023-12-23 19:50:44,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1303840.0, ans=0.0 2023-12-23 19:50:51,578 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.833e+01 4.065e+01 4.320e+01 5.040e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 19:50:54,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1303906.6666666667, ans=0.09899494936611666 2023-12-23 19:50:55,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2023-12-23 19:51:02,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1303973.3333333333, ans=0.125 2023-12-23 19:51:12,103 INFO [train.py:886] (3/4) Epoch 42, batch 200, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01344, audio_tagging_loss=0.01344, over 3157009.26 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:51:30,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1304106.6666666667, ans=0.125 2023-12-23 19:51:59,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1304306.6666666667, ans=0.125 2023-12-23 19:52:02,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1304373.3333333333, ans=0.0 2023-12-23 19:52:03,113 INFO [train.py:886] (3/4) Epoch 42, batch 250, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.01297, audio_tagging_loss=0.01297, over 3553764.74 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:52:28,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1304506.6666666667, ans=0.125 2023-12-23 19:52:33,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1304573.3333333333, ans=0.1 2023-12-23 19:52:34,090 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.363e+01 3.769e+01 3.930e+01 4.156e+01 4.971e+01, threshold=7.859e+01, percent-clipped=0.0 2023-12-23 19:52:38,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1304573.3333333333, ans=0.125 2023-12-23 19:52:50,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1304640.0, ans=0.125 2023-12-23 19:52:55,099 INFO [train.py:886] (3/4) Epoch 42, batch 300, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01269, audio_tagging_loss=0.01269, over 3864871.90 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:53:05,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1304773.3333333333, ans=0.2 2023-12-23 19:53:14,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1304840.0, ans=0.0 2023-12-23 19:53:19,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-23 19:53:46,159 INFO [train.py:886] (3/4) Epoch 42, batch 350, loss[loss=0.0115, audio_tagging_loss=0.0115, over 25000.00 frames. ], tot_loss[loss=0.01244, audio_tagging_loss=0.01244, over 4093108.09 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2023-12-23 19:53:52,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1305040.0, ans=0.0 2023-12-23 19:54:09,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1305173.3333333333, ans=0.1 2023-12-23 19:54:17,294 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.289e+01 3.742e+01 3.933e+01 4.114e+01 4.736e+01, threshold=7.865e+01, percent-clipped=0.0 2023-12-23 19:54:32,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-12-23 19:54:38,499 INFO [train.py:886] (3/4) Epoch 42, batch 400, loss[loss=0.01332, audio_tagging_loss=0.01332, over 24750.00 frames. ], tot_loss[loss=0.01213, audio_tagging_loss=0.01213, over 4281937.12 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:54:44,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1305373.3333333333, ans=0.0 2023-12-23 19:55:01,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1305506.6666666667, ans=0.125 2023-12-23 19:55:29,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1305706.6666666667, ans=0.125 2023-12-23 19:55:30,172 INFO [train.py:886] (3/4) Epoch 42, batch 450, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01184, audio_tagging_loss=0.01184, over 4432786.28 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:55:45,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-12-23 19:55:54,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1305840.0, ans=0.2 2023-12-23 19:55:56,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1305840.0, ans=0.125 2023-12-23 19:55:58,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1305840.0, ans=0.125 2023-12-23 19:56:01,851 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.347e+01 3.725e+01 3.880e+01 4.088e+01 4.948e+01, threshold=7.759e+01, percent-clipped=0.0 2023-12-23 19:56:05,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1305906.6666666667, ans=0.125 2023-12-23 19:56:13,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1305973.3333333333, ans=0.125 2023-12-23 19:56:22,315 INFO [train.py:886] (3/4) Epoch 42, batch 500, loss[loss=0.01116, audio_tagging_loss=0.01116, over 24750.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4545076.83 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:56:28,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1306040.0, ans=0.125 2023-12-23 19:56:31,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1306040.0, ans=0.2 2023-12-23 19:56:33,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1306106.6666666667, ans=0.0 2023-12-23 19:56:36,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1306106.6666666667, ans=0.125 2023-12-23 19:56:46,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1306173.3333333333, ans=0.125 2023-12-23 19:56:50,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1306173.3333333333, ans=0.0 2023-12-23 19:56:53,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-12-23 19:56:55,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1306240.0, ans=0.125 2023-12-23 19:57:07,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1306306.6666666667, ans=0.125 2023-12-23 19:57:09,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1306306.6666666667, ans=0.1 2023-12-23 19:57:10,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1306306.6666666667, ans=22.5 2023-12-23 19:57:11,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1306306.6666666667, ans=10.0 2023-12-23 19:57:15,171 INFO [train.py:886] (3/4) Epoch 42, batch 550, loss[loss=0.01084, audio_tagging_loss=0.01084, over 25000.00 frames. ], tot_loss[loss=0.01157, audio_tagging_loss=0.01157, over 4638609.60 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:57:29,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1306440.0, ans=0.1 2023-12-23 19:57:46,369 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.310e+01 3.673e+01 3.825e+01 3.941e+01 4.727e+01, threshold=7.651e+01, percent-clipped=0.0 2023-12-23 19:58:07,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1306640.0, ans=0.125 2023-12-23 19:58:09,077 INFO [train.py:886] (3/4) Epoch 42, batch 600, loss[loss=0.0142, audio_tagging_loss=0.0142, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4707828.70 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:58:13,799 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:58:19,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-12-23 19:58:25,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1306773.3333333333, ans=0.0 2023-12-23 19:58:43,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-12-23 19:58:53,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1306973.3333333333, ans=0.125 2023-12-23 19:58:54,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1306973.3333333333, ans=0.125 2023-12-23 19:58:56,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1306973.3333333333, ans=0.0 2023-12-23 19:59:00,391 INFO [train.py:886] (3/4) Epoch 42, batch 650, loss[loss=0.01113, audio_tagging_loss=0.01113, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4757617.02 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 19:59:02,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1307040.0, ans=0.125 2023-12-23 19:59:04,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.35 vs. limit=10.0 2023-12-23 19:59:08,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1307040.0, ans=0.0 2023-12-23 19:59:10,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1307106.6666666667, ans=0.125 2023-12-23 19:59:25,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1307173.3333333333, ans=0.035 2023-12-23 19:59:31,742 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.758e+01 3.912e+01 4.116e+01 4.641e+01, threshold=7.823e+01, percent-clipped=0.0 2023-12-23 19:59:36,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 19:59:42,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1307306.6666666667, ans=0.125 2023-12-23 19:59:49,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1307306.6666666667, ans=22.5 2023-12-23 19:59:53,717 INFO [train.py:886] (3/4) Epoch 42, batch 700, loss[loss=0.0113, audio_tagging_loss=0.0113, over 25000.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4795511.55 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 20:00:09,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1307440.0, ans=0.125 2023-12-23 20:00:14,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1307506.6666666667, ans=0.5 2023-12-23 20:00:33,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-12-23 20:00:45,220 INFO [train.py:886] (3/4) Epoch 42, batch 750, loss[loss=0.01142, audio_tagging_loss=0.01142, over 24750.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4821651.50 frames. ], batch size: 99, lr: 2.57e-03, grad_scale: 32.0 2023-12-23 20:00:53,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1307706.6666666667, ans=0.125 2023-12-23 20:01:08,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1307840.0, ans=0.125 2023-12-23 20:01:09,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1307840.0, ans=0.05 2023-12-23 20:01:11,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-12-23 20:01:16,354 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.283e+01 3.690e+01 3.847e+01 4.054e+01 6.024e+01, threshold=7.694e+01, percent-clipped=0.0 2023-12-23 20:01:20,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1307906.6666666667, ans=0.125 2023-12-23 20:01:22,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1307906.6666666667, ans=0.125 2023-12-23 20:01:33,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1307973.3333333333, ans=0.5 2023-12-23 20:01:35,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-12-23 20:01:37,194 INFO [train.py:886] (3/4) Epoch 42, batch 800, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4849849.16 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:01:43,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1308040.0, ans=0.125 2023-12-23 20:01:56,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-12-23 20:02:02,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1308173.3333333333, ans=0.125 2023-12-23 20:02:05,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1308173.3333333333, ans=0.125 2023-12-23 20:02:18,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1308306.6666666667, ans=0.2 2023-12-23 20:02:26,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1308373.3333333333, ans=0.125 2023-12-23 20:02:27,607 INFO [train.py:886] (3/4) Epoch 42, batch 850, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4874842.82 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:02:28,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308373.3333333333, ans=0.1 2023-12-23 20:02:47,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1308506.6666666667, ans=0.125 2023-12-23 20:02:58,920 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.387e+01 3.709e+01 3.857e+01 4.015e+01 4.625e+01, threshold=7.713e+01, percent-clipped=0.0 2023-12-23 20:03:02,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1308573.3333333333, ans=0.2 2023-12-23 20:03:03,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=1308573.3333333333, ans=10.0 2023-12-23 20:03:13,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1308640.0, ans=0.015 2023-12-23 20:03:19,474 INFO [train.py:886] (3/4) Epoch 42, batch 900, loss[loss=0.01057, audio_tagging_loss=0.01057, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4894743.38 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:03:28,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1308706.6666666667, ans=0.125 2023-12-23 20:03:29,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1308773.3333333333, ans=0.07 2023-12-23 20:03:42,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1308840.0, ans=0.125 2023-12-23 20:03:51,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1308906.6666666667, ans=0.0 2023-12-23 20:04:02,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2023-12-23 20:04:12,449 INFO [train.py:886] (3/4) Epoch 42, batch 950, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4902788.49 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:04:26,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1309106.6666666667, ans=0.125 2023-12-23 20:04:33,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1309173.3333333333, ans=0.0 2023-12-23 20:04:36,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-12-23 20:04:38,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1309173.3333333333, ans=0.125 2023-12-23 20:04:43,680 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.390e+01 3.775e+01 3.915e+01 4.098e+01 4.959e+01, threshold=7.831e+01, percent-clipped=0.0 2023-12-23 20:04:44,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1309240.0, ans=0.125 2023-12-23 20:04:47,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-12-23 20:05:04,304 INFO [train.py:886] (3/4) Epoch 42, batch 1000, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4907489.71 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:05:15,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=22.5 2023-12-23 20:05:23,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1309506.6666666667, ans=0.0 2023-12-23 20:05:24,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1309506.6666666667, ans=0.2 2023-12-23 20:05:24,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1309506.6666666667, ans=0.0 2023-12-23 20:05:27,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1309506.6666666667, ans=0.125 2023-12-23 20:05:55,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1309706.6666666667, ans=0.125 2023-12-23 20:05:56,008 INFO [train.py:886] (3/4) Epoch 42, batch 1050, loss[loss=0.01109, audio_tagging_loss=0.01109, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4915798.43 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:05:58,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1309706.6666666667, ans=0.0 2023-12-23 20:06:05,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1309773.3333333333, ans=0.125 2023-12-23 20:06:16,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1309840.0, ans=0.1 2023-12-23 20:06:21,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1309840.0, ans=0.0 2023-12-23 20:06:27,321 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.280e+01 3.716e+01 3.855e+01 4.036e+01 4.580e+01, threshold=7.710e+01, percent-clipped=0.0 2023-12-23 20:06:48,266 INFO [train.py:886] (3/4) Epoch 42, batch 1100, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24032.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4922393.25 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:06:54,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1310040.0, ans=0.125 2023-12-23 20:07:02,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1310106.6666666667, ans=0.1 2023-12-23 20:07:07,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1310173.3333333333, ans=0.1 2023-12-23 20:07:19,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1310240.0, ans=0.125 2023-12-23 20:07:25,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1310240.0, ans=0.0 2023-12-23 20:07:33,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-12-23 20:07:38,690 INFO [train.py:886] (3/4) Epoch 42, batch 1150, loss[loss=0.01242, audio_tagging_loss=0.01242, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4936112.46 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:07:43,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1310373.3333333333, ans=0.0 2023-12-23 20:08:02,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1310506.6666666667, ans=0.0 2023-12-23 20:08:10,038 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.331e+01 3.706e+01 3.860e+01 4.040e+01 4.470e+01, threshold=7.720e+01, percent-clipped=0.0 2023-12-23 20:08:24,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1310640.0, ans=0.1 2023-12-23 20:08:32,217 INFO [train.py:886] (3/4) Epoch 42, batch 1200, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4943030.66 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:08:42,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1310773.3333333333, ans=0.1 2023-12-23 20:08:49,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-12-23 20:09:08,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1310906.6666666667, ans=0.0 2023-12-23 20:09:22,919 INFO [train.py:886] (3/4) Epoch 42, batch 1250, loss[loss=0.0102, audio_tagging_loss=0.0102, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4938339.78 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:09:29,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1311040.0, ans=0.5 2023-12-23 20:09:32,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1311040.0, ans=0.125 2023-12-23 20:09:53,462 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.763e+01 3.909e+01 4.123e+01 4.708e+01, threshold=7.819e+01, percent-clipped=0.0 2023-12-23 20:10:06,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1311306.6666666667, ans=0.125 2023-12-23 20:10:06,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-12-23 20:10:13,802 INFO [train.py:886] (3/4) Epoch 42, batch 1300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4936855.13 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:10:13,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1311373.3333333333, ans=0.125 2023-12-23 20:10:27,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1311440.0, ans=0.125 2023-12-23 20:10:34,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1311506.6666666667, ans=0.1 2023-12-23 20:10:34,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1311506.6666666667, ans=0.0 2023-12-23 20:10:41,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1311506.6666666667, ans=0.2 2023-12-23 20:11:00,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1311640.0, ans=0.125 2023-12-23 20:11:05,800 INFO [train.py:886] (3/4) Epoch 42, batch 1350, loss[loss=0.008334, audio_tagging_loss=0.008334, over 25000.00 frames. ], tot_loss[loss=0.01143, audio_tagging_loss=0.01143, over 4940716.55 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:11:09,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1311706.6666666667, ans=0.0 2023-12-23 20:11:09,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1311706.6666666667, ans=0.0 2023-12-23 20:11:36,584 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.337e+01 3.709e+01 3.877e+01 4.071e+01 5.035e+01, threshold=7.754e+01, percent-clipped=0.0 2023-12-23 20:11:37,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1311906.6666666667, ans=0.125 2023-12-23 20:11:38,799 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:11:41,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1311906.6666666667, ans=0.0 2023-12-23 20:11:57,004 INFO [train.py:886] (3/4) Epoch 42, batch 1400, loss[loss=0.01312, audio_tagging_loss=0.01312, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4945686.21 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:12:00,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1312040.0, ans=0.125 2023-12-23 20:12:04,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1312040.0, ans=0.125 2023-12-23 20:12:33,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1312240.0, ans=0.0 2023-12-23 20:12:34,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1312240.0, ans=0.125 2023-12-23 20:12:34,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1312240.0, ans=0.1 2023-12-23 20:12:48,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1312373.3333333333, ans=0.0 2023-12-23 20:12:49,214 INFO [train.py:886] (3/4) Epoch 42, batch 1450, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4947296.61 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:12:52,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1312373.3333333333, ans=0.2 2023-12-23 20:12:57,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1312440.0, ans=0.125 2023-12-23 20:13:16,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1312506.6666666667, ans=0.1 2023-12-23 20:13:20,250 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.348e+01 3.643e+01 3.891e+01 4.028e+01 4.685e+01, threshold=7.783e+01, percent-clipped=0.0 2023-12-23 20:13:20,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1312573.3333333333, ans=15.0 2023-12-23 20:13:39,823 INFO [train.py:886] (3/4) Epoch 42, batch 1500, loss[loss=0.01408, audio_tagging_loss=0.01408, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4950099.89 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:13:55,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1312773.3333333333, ans=0.0 2023-12-23 20:14:02,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1312840.0, ans=0.125 2023-12-23 20:14:09,277 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:14:23,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-12-23 20:14:24,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2023-12-23 20:14:32,120 INFO [train.py:886] (3/4) Epoch 42, batch 1550, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4948699.73 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:14:40,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-12-23 20:14:46,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2023-12-23 20:14:48,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1313106.6666666667, ans=0.125 2023-12-23 20:14:53,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1313173.3333333333, ans=0.125 2023-12-23 20:15:03,381 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.724e+01 3.969e+01 4.156e+01 4.573e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 20:15:03,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1313240.0, ans=0.125 2023-12-23 20:15:24,389 INFO [train.py:886] (3/4) Epoch 42, batch 1600, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4940462.94 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:15:31,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1313373.3333333333, ans=0.0 2023-12-23 20:15:50,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1313506.6666666667, ans=0.125 2023-12-23 20:16:01,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1313573.3333333333, ans=0.125 2023-12-23 20:16:01,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1313573.3333333333, ans=0.1 2023-12-23 20:16:04,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1313573.3333333333, ans=0.125 2023-12-23 20:16:12,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-12-23 20:16:15,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1313706.6666666667, ans=0.0 2023-12-23 20:16:16,132 INFO [train.py:886] (3/4) Epoch 42, batch 1650, loss[loss=0.01409, audio_tagging_loss=0.01409, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4939440.94 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:16:25,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1313706.6666666667, ans=0.2 2023-12-23 20:16:25,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1313706.6666666667, ans=0.0 2023-12-23 20:16:32,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1313773.3333333333, ans=0.2 2023-12-23 20:16:43,284 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:16:43,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1313840.0, ans=0.125 2023-12-23 20:16:47,631 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.698e+01 3.919e+01 4.098e+01 5.152e+01, threshold=7.837e+01, percent-clipped=0.0 2023-12-23 20:17:00,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2023-12-23 20:17:00,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-12-23 20:17:07,864 INFO [train.py:886] (3/4) Epoch 42, batch 1700, loss[loss=0.01171, audio_tagging_loss=0.01171, over 25000.00 frames. ], tot_loss[loss=0.01145, audio_tagging_loss=0.01145, over 4941479.65 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:17:41,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1314240.0, ans=0.2 2023-12-23 20:17:46,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1314240.0, ans=0.2 2023-12-23 20:18:00,811 INFO [train.py:886] (3/4) Epoch 42, batch 1750, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4945562.87 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:18:08,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1314373.3333333333, ans=0.07 2023-12-23 20:18:15,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-12-23 20:18:23,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1314506.6666666667, ans=0.2 2023-12-23 20:18:31,276 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.733e+01 3.893e+01 4.031e+01 4.588e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 20:18:52,544 INFO [train.py:886] (3/4) Epoch 42, batch 1800, loss[loss=0.01292, audio_tagging_loss=0.01292, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4951869.40 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:18:56,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1314706.6666666667, ans=0.1 2023-12-23 20:19:14,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1314840.0, ans=0.125 2023-12-23 20:19:25,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1314906.6666666667, ans=0.05 2023-12-23 20:19:27,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1314906.6666666667, ans=0.125 2023-12-23 20:19:36,113 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:19:37,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1314973.3333333333, ans=0.05 2023-12-23 20:19:45,046 INFO [train.py:886] (3/4) Epoch 42, batch 1850, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24750.00 frames. ], tot_loss[loss=0.01151, audio_tagging_loss=0.01151, over 4952317.11 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:19:58,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1315106.6666666667, ans=0.2 2023-12-23 20:20:03,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1315106.6666666667, ans=0.2 2023-12-23 20:20:05,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1315173.3333333333, ans=0.125 2023-12-23 20:20:16,580 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.781e+01 3.954e+01 4.123e+01 5.023e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 20:20:20,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1315240.0, ans=0.125 2023-12-23 20:20:21,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1315240.0, ans=0.125 2023-12-23 20:20:29,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1315306.6666666667, ans=0.125 2023-12-23 20:20:31,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1315306.6666666667, ans=0.2 2023-12-23 20:20:34,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1315306.6666666667, ans=0.1 2023-12-23 20:20:37,026 INFO [train.py:886] (3/4) Epoch 42, batch 1900, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01156, audio_tagging_loss=0.01156, over 4950208.18 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:20:39,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=12.0 2023-12-23 20:20:45,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1315373.3333333333, ans=0.0 2023-12-23 20:21:00,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1315506.6666666667, ans=0.09899494936611666 2023-12-23 20:21:10,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2023-12-23 20:21:26,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1315640.0, ans=0.125 2023-12-23 20:21:28,607 INFO [train.py:886] (3/4) Epoch 42, batch 1950, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.0115, audio_tagging_loss=0.0115, over 4946572.67 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:21:46,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1315773.3333333333, ans=0.2 2023-12-23 20:21:59,664 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.467e+01 3.743e+01 3.848e+01 4.008e+01 4.780e+01, threshold=7.697e+01, percent-clipped=0.0 2023-12-23 20:22:04,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2023-12-23 20:22:06,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1315906.6666666667, ans=0.1 2023-12-23 20:22:12,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2023-12-23 20:22:12,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-12-23 20:22:21,587 INFO [train.py:886] (3/4) Epoch 42, batch 2000, loss[loss=0.007856, audio_tagging_loss=0.007856, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4950838.80 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 32.0 2023-12-23 20:22:25,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1316040.0, ans=0.125 2023-12-23 20:22:30,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1316106.6666666667, ans=0.04949747468305833 2023-12-23 20:22:31,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1316106.6666666667, ans=0.125 2023-12-23 20:23:06,906 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:23:11,331 INFO [train.py:886] (3/4) Epoch 42, batch 2050, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4951223.41 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:23:21,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1316440.0, ans=0.1 2023-12-23 20:23:28,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1316440.0, ans=0.1 2023-12-23 20:23:35,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1316506.6666666667, ans=0.1 2023-12-23 20:23:41,638 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.684e+01 3.851e+01 4.044e+01 4.837e+01, threshold=7.702e+01, percent-clipped=0.0 2023-12-23 20:24:02,941 INFO [train.py:886] (3/4) Epoch 42, batch 2100, loss[loss=0.0129, audio_tagging_loss=0.0129, over 21226.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4955045.89 frames. ], batch size: 107, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:24:09,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1316706.6666666667, ans=0.0 2023-12-23 20:24:52,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-23 20:24:53,620 INFO [train.py:886] (3/4) Epoch 42, batch 2150, loss[loss=0.01162, audio_tagging_loss=0.01162, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4957335.79 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:25:02,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1317040.0, ans=0.1 2023-12-23 20:25:07,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1317106.6666666667, ans=0.0 2023-12-23 20:25:08,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1317106.6666666667, ans=0.0 2023-12-23 20:25:18,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1317173.3333333333, ans=0.2 2023-12-23 20:25:24,375 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.516e+01 3.741e+01 3.904e+01 4.093e+01 4.524e+01, threshold=7.808e+01, percent-clipped=0.0 2023-12-23 20:25:44,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1317306.6666666667, ans=0.0 2023-12-23 20:25:45,640 INFO [train.py:886] (3/4) Epoch 42, batch 2200, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4952806.53 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:25:49,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1317373.3333333333, ans=0.1 2023-12-23 20:25:50,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-12-23 20:26:08,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1317506.6666666667, ans=0.02 2023-12-23 20:26:16,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1317573.3333333333, ans=0.0 2023-12-23 20:26:37,031 INFO [train.py:886] (3/4) Epoch 42, batch 2250, loss[loss=0.01119, audio_tagging_loss=0.01119, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4951432.38 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:26:59,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1317840.0, ans=0.0 2023-12-23 20:27:01,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1317840.0, ans=0.1 2023-12-23 20:27:07,754 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.424e+01 3.775e+01 3.919e+01 4.105e+01 4.695e+01, threshold=7.838e+01, percent-clipped=0.0 2023-12-23 20:27:14,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1317906.6666666667, ans=0.0 2023-12-23 20:27:19,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1317973.3333333333, ans=0.0 2023-12-23 20:27:22,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1317973.3333333333, ans=0.0 2023-12-23 20:27:26,596 INFO [train.py:886] (3/4) Epoch 42, batch 2300, loss[loss=0.009713, audio_tagging_loss=0.009713, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4953068.49 frames. ], batch size: 99, lr: 2.56e-03, grad_scale: 64.0 2023-12-23 20:27:40,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1318106.6666666667, ans=0.2 2023-12-23 20:27:45,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-12-23 20:27:56,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1318173.3333333333, ans=0.0 2023-12-23 20:27:57,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-12-23 20:28:14,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-12-23 20:28:20,075 INFO [train.py:886] (3/4) Epoch 42, batch 2350, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4955671.77 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:28:22,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1318373.3333333333, ans=0.0 2023-12-23 20:28:50,511 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.647e+01 3.807e+01 4.043e+01 4.652e+01, threshold=7.613e+01, percent-clipped=0.0 2023-12-23 20:29:12,147 INFO [train.py:886] (3/4) Epoch 42, batch 2400, loss[loss=0.008931, audio_tagging_loss=0.008931, over 25000.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4957506.28 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:29:20,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1318706.6666666667, ans=0.0 2023-12-23 20:29:25,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-12-23 20:29:34,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1318840.0, ans=0.125 2023-12-23 20:29:40,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318840.0, ans=0.1 2023-12-23 20:29:41,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1318906.6666666667, ans=0.125 2023-12-23 20:29:45,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1318906.6666666667, ans=0.05 2023-12-23 20:29:54,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1318973.3333333333, ans=0.125 2023-12-23 20:29:55,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1318973.3333333333, ans=0.2 2023-12-23 20:29:55,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1318973.3333333333, ans=0.125 2023-12-23 20:30:03,214 INFO [train.py:886] (3/4) Epoch 42, batch 2450, loss[loss=0.0111, audio_tagging_loss=0.0111, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4960670.00 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:30:03,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1319040.0, ans=0.1 2023-12-23 20:30:34,569 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.760e+01 3.936e+01 4.132e+01 5.379e+01, threshold=7.871e+01, percent-clipped=0.0 2023-12-23 20:30:55,605 INFO [train.py:886] (3/4) Epoch 42, batch 2500, loss[loss=0.01318, audio_tagging_loss=0.01318, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4956104.80 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:31:02,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-12-23 20:31:06,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1319440.0, ans=0.0 2023-12-23 20:31:31,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1319573.3333333333, ans=0.125 2023-12-23 20:31:35,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1319573.3333333333, ans=0.125 2023-12-23 20:31:38,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1319640.0, ans=0.1 2023-12-23 20:31:46,300 INFO [train.py:886] (3/4) Epoch 42, batch 2550, loss[loss=0.01125, audio_tagging_loss=0.01125, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4948452.16 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:31:49,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1319706.6666666667, ans=0.125 2023-12-23 20:31:56,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1319706.6666666667, ans=0.125 2023-12-23 20:32:02,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1319773.3333333333, ans=0.0 2023-12-23 20:32:17,485 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.824e+01 3.969e+01 4.161e+01 4.691e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 20:32:22,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1319906.6666666667, ans=0.125 2023-12-23 20:32:31,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1319973.3333333333, ans=0.125 2023-12-23 20:32:35,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-12-23 20:32:37,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1319973.3333333333, ans=0.2 2023-12-23 20:32:39,488 INFO [train.py:886] (3/4) Epoch 42, batch 2600, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4948149.37 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:32:42,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1320040.0, ans=0.125 2023-12-23 20:32:43,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-12-23 20:32:44,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1320040.0, ans=0.0 2023-12-23 20:32:48,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1320106.6666666667, ans=0.125 2023-12-23 20:33:10,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1320240.0, ans=0.1 2023-12-23 20:33:13,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.68 vs. limit=10.0 2023-12-23 20:33:26,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1320306.6666666667, ans=0.125 2023-12-23 20:33:30,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1320373.3333333333, ans=0.125 2023-12-23 20:33:31,687 INFO [train.py:886] (3/4) Epoch 42, batch 2650, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4949347.09 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:34:02,587 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.676e+01 3.866e+01 4.014e+01 4.776e+01, threshold=7.733e+01, percent-clipped=0.0 2023-12-23 20:34:16,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1320640.0, ans=0.125 2023-12-23 20:34:21,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1320706.6666666667, ans=0.125 2023-12-23 20:34:22,619 INFO [train.py:886] (3/4) Epoch 42, batch 2700, loss[loss=0.01235, audio_tagging_loss=0.01235, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4952071.56 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:34:37,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2023-12-23 20:34:37,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=12.0 2023-12-23 20:34:52,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1320840.0, ans=0.0 2023-12-23 20:34:56,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1320906.6666666667, ans=0.1 2023-12-23 20:35:14,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1321040.0, ans=0.0 2023-12-23 20:35:15,198 INFO [train.py:886] (3/4) Epoch 42, batch 2750, loss[loss=0.01259, audio_tagging_loss=0.01259, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4955012.51 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:35:26,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=8.0 2023-12-23 20:35:34,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1321173.3333333333, ans=0.125 2023-12-23 20:35:42,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1321173.3333333333, ans=0.125 2023-12-23 20:35:46,369 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.326e+01 3.739e+01 3.858e+01 4.112e+01 5.048e+01, threshold=7.715e+01, percent-clipped=0.0 2023-12-23 20:35:49,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-12-23 20:35:51,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.60 vs. limit=10.0 2023-12-23 20:35:57,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1321306.6666666667, ans=0.0 2023-12-23 20:36:03,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1321306.6666666667, ans=10.0 2023-12-23 20:36:07,100 INFO [train.py:886] (3/4) Epoch 42, batch 2800, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4955429.76 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:36:21,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1321440.0, ans=0.2 2023-12-23 20:36:37,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1321573.3333333333, ans=0.0 2023-12-23 20:36:40,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1321573.3333333333, ans=0.0 2023-12-23 20:36:40,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1321573.3333333333, ans=0.125 2023-12-23 20:36:59,077 INFO [train.py:886] (3/4) Epoch 42, batch 2850, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24750.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4952073.60 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:37:04,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1321706.6666666667, ans=0.2 2023-12-23 20:37:29,689 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.343e+01 3.804e+01 3.965e+01 4.086e+01 5.143e+01, threshold=7.930e+01, percent-clipped=0.0 2023-12-23 20:37:38,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1321906.6666666667, ans=0.125 2023-12-23 20:37:51,428 INFO [train.py:886] (3/4) Epoch 42, batch 2900, loss[loss=0.008615, audio_tagging_loss=0.008615, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4945107.60 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:37:52,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1322040.0, ans=0.125 2023-12-23 20:37:58,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1322040.0, ans=0.0 2023-12-23 20:38:00,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-12-23 20:38:04,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1322106.6666666667, ans=0.125 2023-12-23 20:38:25,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1322240.0, ans=0.125 2023-12-23 20:38:42,608 INFO [train.py:886] (3/4) Epoch 42, batch 2950, loss[loss=0.01172, audio_tagging_loss=0.01172, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4949249.00 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:38:49,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1322373.3333333333, ans=0.125 2023-12-23 20:39:14,045 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.161e+01 3.693e+01 3.854e+01 4.054e+01 4.408e+01, threshold=7.709e+01, percent-clipped=0.0 2023-12-23 20:39:18,729 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:39:26,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1322640.0, ans=0.125 2023-12-23 20:39:35,454 INFO [train.py:886] (3/4) Epoch 42, batch 3000, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4952724.78 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:39:35,454 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 20:39:56,127 INFO [train.py:917] (3/4) Epoch 42, validation: loss=0.03585, audio_tagging_loss=0.03585, over 3737520.00 frames. 2023-12-23 20:39:56,128 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 20:40:03,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1322706.6666666667, ans=0.125 2023-12-23 20:40:15,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1322840.0, ans=0.125 2023-12-23 20:40:16,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1322840.0, ans=0.0 2023-12-23 20:40:22,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1322840.0, ans=0.0 2023-12-23 20:40:35,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1322973.3333333333, ans=0.125 2023-12-23 20:40:46,849 INFO [train.py:886] (3/4) Epoch 42, batch 3050, loss[loss=0.01198, audio_tagging_loss=0.01198, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4954475.47 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:40:48,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-12-23 20:40:54,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1323040.0, ans=0.125 2023-12-23 20:41:07,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1323173.3333333333, ans=0.125 2023-12-23 20:41:09,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1323173.3333333333, ans=0.0 2023-12-23 20:41:09,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-12-23 20:41:17,727 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+01 3.719e+01 3.883e+01 4.038e+01 4.632e+01, threshold=7.765e+01, percent-clipped=0.0 2023-12-23 20:41:39,291 INFO [train.py:886] (3/4) Epoch 42, batch 3100, loss[loss=0.01163, audio_tagging_loss=0.01163, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4949170.78 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:41:41,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2023-12-23 20:41:48,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1323440.0, ans=0.125 2023-12-23 20:42:03,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1323506.6666666667, ans=0.0 2023-12-23 20:42:08,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2023-12-23 20:42:20,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=22.5 2023-12-23 20:42:22,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1323640.0, ans=0.0 2023-12-23 20:42:31,532 INFO [train.py:886] (3/4) Epoch 42, batch 3150, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4947691.22 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:42:43,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=12.0 2023-12-23 20:42:46,772 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:42:49,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1323840.0, ans=0.0 2023-12-23 20:43:01,736 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.789e+01 3.951e+01 4.110e+01 5.774e+01, threshold=7.902e+01, percent-clipped=0.0 2023-12-23 20:43:17,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2023-12-23 20:43:21,052 INFO [train.py:886] (3/4) Epoch 42, batch 3200, loss[loss=0.0132, audio_tagging_loss=0.0132, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4947563.31 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:43:35,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1324106.6666666667, ans=0.125 2023-12-23 20:43:35,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-12-23 20:43:36,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-12-23 20:43:56,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1324240.0, ans=0.125 2023-12-23 20:44:13,410 INFO [train.py:886] (3/4) Epoch 42, batch 3250, loss[loss=0.009856, audio_tagging_loss=0.009856, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4949829.91 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:44:26,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324440.0, ans=0.1 2023-12-23 20:44:44,155 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.306e+01 3.683e+01 3.870e+01 4.032e+01 4.579e+01, threshold=7.741e+01, percent-clipped=0.0 2023-12-23 20:44:51,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2023-12-23 20:44:54,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-12-23 20:44:55,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1324640.0, ans=0.09899494936611666 2023-12-23 20:45:02,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1324706.6666666667, ans=0.1 2023-12-23 20:45:03,260 INFO [train.py:886] (3/4) Epoch 42, batch 3300, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4955084.26 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:18,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1324773.3333333333, ans=0.125 2023-12-23 20:45:23,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1324773.3333333333, ans=0.125 2023-12-23 20:45:23,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1324773.3333333333, ans=0.02 2023-12-23 20:45:32,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1324840.0, ans=0.0 2023-12-23 20:45:46,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1324973.3333333333, ans=0.1 2023-12-23 20:45:49,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1324973.3333333333, ans=0.0 2023-12-23 20:45:52,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1324973.3333333333, ans=0.125 2023-12-23 20:45:54,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1324973.3333333333, ans=0.125 2023-12-23 20:45:55,686 INFO [train.py:886] (3/4) Epoch 42, batch 3350, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4958079.64 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:45:58,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-12-23 20:45:59,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1325040.0, ans=0.125 2023-12-23 20:46:08,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1325106.6666666667, ans=0.125 2023-12-23 20:46:09,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1325106.6666666667, ans=0.125 2023-12-23 20:46:27,253 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.349e+01 3.683e+01 3.878e+01 4.092e+01 4.733e+01, threshold=7.755e+01, percent-clipped=0.0 2023-12-23 20:46:48,518 INFO [train.py:886] (3/4) Epoch 42, batch 3400, loss[loss=0.01344, audio_tagging_loss=0.01344, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4960592.88 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:47:26,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-12-23 20:47:38,511 INFO [train.py:886] (3/4) Epoch 42, batch 3450, loss[loss=0.00838, audio_tagging_loss=0.00838, over 23985.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4946049.46 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:47:40,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2023-12-23 20:47:49,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1325773.3333333333, ans=0.125 2023-12-23 20:48:00,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1325840.0, ans=0.125 2023-12-23 20:48:09,691 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.370e+01 3.738e+01 3.925e+01 4.101e+01 5.009e+01, threshold=7.849e+01, percent-clipped=0.0 2023-12-23 20:48:31,447 INFO [train.py:886] (3/4) Epoch 42, batch 3500, loss[loss=0.01049, audio_tagging_loss=0.01049, over 24750.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4939332.05 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:48:37,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2023-12-23 20:48:46,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1326106.6666666667, ans=0.0 2023-12-23 20:48:50,185 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:49:17,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1326306.6666666667, ans=0.0 2023-12-23 20:49:19,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1326306.6666666667, ans=0.0 2023-12-23 20:49:22,271 INFO [train.py:886] (3/4) Epoch 42, batch 3550, loss[loss=0.009472, audio_tagging_loss=0.009472, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4943757.88 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:49:26,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1326373.3333333333, ans=0.125 2023-12-23 20:49:47,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1326506.6666666667, ans=0.2 2023-12-23 20:49:53,671 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.537e+01 3.732e+01 3.866e+01 4.048e+01 4.845e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:50:14,571 INFO [train.py:886] (3/4) Epoch 42, batch 3600, loss[loss=0.00905, audio_tagging_loss=0.00905, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4944185.40 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:50:35,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1326840.0, ans=0.05 2023-12-23 20:50:37,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1326840.0, ans=0.0 2023-12-23 20:50:44,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1326840.0, ans=0.125 2023-12-23 20:50:56,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1326973.3333333333, ans=0.1 2023-12-23 20:51:07,219 INFO [train.py:886] (3/4) Epoch 42, batch 3650, loss[loss=0.009355, audio_tagging_loss=0.009355, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4947043.59 frames. ], batch size: 99, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:51:17,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=22.5 2023-12-23 20:51:22,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1327106.6666666667, ans=0.1 2023-12-23 20:51:29,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1327173.3333333333, ans=0.2 2023-12-23 20:51:38,591 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.685e+01 3.869e+01 4.096e+01 4.964e+01, threshold=7.739e+01, percent-clipped=0.0 2023-12-23 20:51:43,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1327240.0, ans=0.0 2023-12-23 20:51:50,834 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:51:56,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1327306.6666666667, ans=0.1 2023-12-23 20:51:58,198 INFO [train.py:886] (3/4) Epoch 42, batch 3700, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4955389.45 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:52:14,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1327440.0, ans=0.0 2023-12-23 20:52:17,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1327440.0, ans=0.07 2023-12-23 20:52:36,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1327573.3333333333, ans=0.125 2023-12-23 20:52:44,204 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:52:48,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1327640.0, ans=0.05 2023-12-23 20:52:49,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2023-12-23 20:52:50,630 INFO [train.py:886] (3/4) Epoch 42, batch 3750, loss[loss=0.01221, audio_tagging_loss=0.01221, over 21944.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4946020.04 frames. ], batch size: 107, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:52:51,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1327706.6666666667, ans=10.0 2023-12-23 20:52:57,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1327706.6666666667, ans=0.0 2023-12-23 20:52:59,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1327706.6666666667, ans=0.0 2023-12-23 20:53:13,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1327840.0, ans=0.125 2023-12-23 20:53:21,793 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.847e+01 3.984e+01 4.181e+01 4.844e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 20:53:37,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-12-23 20:53:42,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1328040.0, ans=0.0 2023-12-23 20:53:43,716 INFO [train.py:886] (3/4) Epoch 42, batch 3800, loss[loss=0.01402, audio_tagging_loss=0.01402, over 25000.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4944807.45 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:53:49,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1328040.0, ans=0.2 2023-12-23 20:53:50,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2023-12-23 20:53:53,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2023-12-23 20:53:55,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1328106.6666666667, ans=0.2 2023-12-23 20:53:58,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1328106.6666666667, ans=0.0 2023-12-23 20:53:59,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2023-12-23 20:54:11,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1328173.3333333333, ans=0.125 2023-12-23 20:54:25,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1328306.6666666667, ans=0.0 2023-12-23 20:54:34,333 INFO [train.py:886] (3/4) Epoch 42, batch 3850, loss[loss=0.01236, audio_tagging_loss=0.01236, over 21398.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4939381.87 frames. ], batch size: 107, lr: 2.55e-03, grad_scale: 64.0 2023-12-23 20:54:38,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328373.3333333333, ans=0.1 2023-12-23 20:54:55,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1328506.6666666667, ans=0.125 2023-12-23 20:55:00,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1328506.6666666667, ans=0.025 2023-12-23 20:55:00,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=22.5 2023-12-23 20:55:05,494 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.259e+01 3.729e+01 3.866e+01 4.038e+01 4.994e+01, threshold=7.731e+01, percent-clipped=0.0 2023-12-23 20:55:21,032 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 20:55:26,612 INFO [train.py:886] (3/4) Epoch 42, batch 3900, loss[loss=0.01165, audio_tagging_loss=0.01165, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4947849.18 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:55:33,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-12-23 20:55:35,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1328706.6666666667, ans=0.125 2023-12-23 20:55:44,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1328773.3333333333, ans=0.0 2023-12-23 20:55:57,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1328906.6666666667, ans=0.2 2023-12-23 20:56:12,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1328973.3333333333, ans=0.1 2023-12-23 20:56:17,562 INFO [train.py:886] (3/4) Epoch 42, batch 3950, loss[loss=0.008264, audio_tagging_loss=0.008264, over 24051.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4953355.35 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:56:47,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-12-23 20:56:49,246 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.779e+01 3.890e+01 4.083e+01 4.663e+01, threshold=7.780e+01, percent-clipped=0.0 2023-12-23 20:57:09,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1329373.3333333333, ans=0.2 2023-12-23 20:57:10,635 INFO [train.py:886] (3/4) Epoch 42, batch 4000, loss[loss=0.01014, audio_tagging_loss=0.01014, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4956455.84 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:57:15,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1329373.3333333333, ans=0.0 2023-12-23 20:57:26,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1329440.0, ans=0.125 2023-12-23 20:57:39,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329506.6666666667, ans=0.1 2023-12-23 20:57:40,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329506.6666666667, ans=0.1 2023-12-23 20:58:01,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1329640.0, ans=0.125 2023-12-23 20:58:03,491 INFO [train.py:886] (3/4) Epoch 42, batch 4050, loss[loss=0.009654, audio_tagging_loss=0.009654, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4959468.29 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:58:06,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.07 vs. limit=12.0 2023-12-23 20:58:09,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1329706.6666666667, ans=0.125 2023-12-23 20:58:09,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1329706.6666666667, ans=0.07 2023-12-23 20:58:12,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2023-12-23 20:58:22,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1329840.0, ans=0.125 2023-12-23 20:58:29,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1329840.0, ans=0.125 2023-12-23 20:58:35,416 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.762e+01 3.929e+01 4.087e+01 5.094e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 20:58:37,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1329906.6666666667, ans=0.125 2023-12-23 20:58:38,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1329906.6666666667, ans=0.1 2023-12-23 20:58:39,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329906.6666666667, ans=0.1 2023-12-23 20:58:47,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1329973.3333333333, ans=0.0 2023-12-23 20:58:50,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1329973.3333333333, ans=0.09899494936611666 2023-12-23 20:58:53,483 INFO [train.py:886] (3/4) Epoch 42, batch 4100, loss[loss=0.01191, audio_tagging_loss=0.01191, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4951618.32 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:58:58,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1330040.0, ans=0.0 2023-12-23 20:59:05,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1330106.6666666667, ans=0.035 2023-12-23 20:59:06,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1330106.6666666667, ans=0.125 2023-12-23 20:59:11,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1330106.6666666667, ans=0.125 2023-12-23 20:59:20,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1330173.3333333333, ans=0.125 2023-12-23 20:59:29,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1330240.0, ans=0.0 2023-12-23 20:59:36,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1330306.6666666667, ans=0.0 2023-12-23 20:59:45,178 INFO [train.py:886] (3/4) Epoch 42, batch 4150, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01139, audio_tagging_loss=0.01139, over 4944774.65 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 64.0 2023-12-23 20:59:45,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-12-23 20:59:49,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1330373.3333333333, ans=0.2 2023-12-23 21:00:08,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-12-23 21:00:16,778 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.800e+01 3.926e+01 4.112e+01 4.569e+01, threshold=7.851e+01, percent-clipped=0.0 2023-12-23 21:00:23,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1330573.3333333333, ans=0.07 2023-12-23 21:00:27,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1330640.0, ans=0.0 2023-12-23 21:00:36,530 INFO [train.py:886] (3/4) Epoch 42, batch 4200, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4945809.50 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:00:44,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1330706.6666666667, ans=0.0 2023-12-23 21:00:47,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1330773.3333333333, ans=0.1 2023-12-23 21:00:51,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2023-12-23 21:00:54,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1330773.3333333333, ans=0.0 2023-12-23 21:00:59,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1330840.0, ans=0.07 2023-12-23 21:01:05,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-12-23 21:01:14,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2023-12-23 21:01:23,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1330973.3333333333, ans=0.1 2023-12-23 21:01:26,731 INFO [train.py:886] (3/4) Epoch 42, batch 4250, loss[loss=0.01053, audio_tagging_loss=0.01053, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4952513.88 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:01:46,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=22.5 2023-12-23 21:01:59,510 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.725e+01 3.912e+01 4.109e+01 5.134e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 21:02:00,698 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:02:18,822 INFO [train.py:886] (3/4) Epoch 42, batch 4300, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4959888.71 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:02:20,055 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:02:40,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1331506.6666666667, ans=0.0 2023-12-23 21:02:46,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331506.6666666667, ans=0.1 2023-12-23 21:02:49,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1331573.3333333333, ans=0.125 2023-12-23 21:03:09,435 INFO [train.py:886] (3/4) Epoch 42, batch 4350, loss[loss=0.00776, audio_tagging_loss=0.00776, over 23961.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4961107.74 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:03:17,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-12-23 21:03:25,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331773.3333333333, ans=0.1 2023-12-23 21:03:30,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1331840.0, ans=0.0 2023-12-23 21:03:41,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.795e+01 3.947e+01 4.145e+01 4.884e+01, threshold=7.894e+01, percent-clipped=0.0 2023-12-23 21:03:51,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1331973.3333333333, ans=0.0 2023-12-23 21:04:01,123 INFO [train.py:886] (3/4) Epoch 42, batch 4400, loss[loss=0.01387, audio_tagging_loss=0.01387, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4952038.63 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:04:01,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1332040.0, ans=0.0 2023-12-23 21:04:03,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1332040.0, ans=0.1 2023-12-23 21:04:11,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1332106.6666666667, ans=0.125 2023-12-23 21:04:30,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1332240.0, ans=0.0 2023-12-23 21:04:31,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1332240.0, ans=0.0 2023-12-23 21:04:32,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1332240.0, ans=0.0 2023-12-23 21:04:38,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2023-12-23 21:04:52,971 INFO [train.py:886] (3/4) Epoch 42, batch 4450, loss[loss=0.01059, audio_tagging_loss=0.01059, over 24750.00 frames. ], tot_loss[loss=0.01149, audio_tagging_loss=0.01149, over 4951221.31 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:04:56,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1332373.3333333333, ans=0.2 2023-12-23 21:05:04,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-12-23 21:05:26,400 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.806e+01 3.979e+01 4.216e+01 4.903e+01, threshold=7.957e+01, percent-clipped=0.0 2023-12-23 21:05:27,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1332573.3333333333, ans=0.125 2023-12-23 21:05:30,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-12-23 21:05:43,549 INFO [train.py:886] (3/4) Epoch 42, batch 4500, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4949023.25 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:06:00,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1332773.3333333333, ans=0.2 2023-12-23 21:06:02,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1332773.3333333333, ans=0.035 2023-12-23 21:06:14,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1332906.6666666667, ans=0.0 2023-12-23 21:06:30,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1332973.3333333333, ans=0.125 2023-12-23 21:06:36,311 INFO [train.py:886] (3/4) Epoch 42, batch 4550, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4953254.59 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:06:48,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1333106.6666666667, ans=0.0 2023-12-23 21:07:09,416 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.739e+01 3.901e+01 4.086e+01 5.054e+01, threshold=7.802e+01, percent-clipped=0.0 2023-12-23 21:07:11,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1333240.0, ans=0.125 2023-12-23 21:07:14,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1333240.0, ans=0.07 2023-12-23 21:07:19,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1333306.6666666667, ans=0.0 2023-12-23 21:07:27,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1333373.3333333333, ans=0.125 2023-12-23 21:07:28,500 INFO [train.py:886] (3/4) Epoch 42, batch 4600, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4949802.99 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:07:34,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1333373.3333333333, ans=0.0 2023-12-23 21:07:37,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333373.3333333333, ans=0.1 2023-12-23 21:07:46,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1333440.0, ans=0.125 2023-12-23 21:07:47,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1333440.0, ans=0.035 2023-12-23 21:07:48,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1333440.0, ans=0.0 2023-12-23 21:07:55,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-12-23 21:07:58,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1333573.3333333333, ans=0.1 2023-12-23 21:07:59,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-12-23 21:08:11,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-12-23 21:08:15,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1333640.0, ans=0.125 2023-12-23 21:08:20,865 INFO [train.py:886] (3/4) Epoch 42, batch 4650, loss[loss=0.009226, audio_tagging_loss=0.009226, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4947976.81 frames. ], batch size: 100, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:08:26,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1333706.6666666667, ans=22.5 2023-12-23 21:08:30,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1333773.3333333333, ans=0.125 2023-12-23 21:08:42,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1333840.0, ans=0.125 2023-12-23 21:08:54,139 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.336e+01 3.819e+01 3.947e+01 4.083e+01 4.797e+01, threshold=7.893e+01, percent-clipped=0.0 2023-12-23 21:08:54,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1333906.6666666667, ans=0.0 2023-12-23 21:08:59,038 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:09:10,775 INFO [train.py:886] (3/4) Epoch 42, batch 4700, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4949780.89 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:09:11,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1334040.0, ans=0.1 2023-12-23 21:09:24,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1334106.6666666667, ans=0.0 2023-12-23 21:09:33,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1334173.3333333333, ans=0.0 2023-12-23 21:09:34,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1334173.3333333333, ans=0.125 2023-12-23 21:09:40,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1334240.0, ans=0.125 2023-12-23 21:09:58,622 INFO [train.py:886] (3/4) Epoch 42, batch 4750, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4946950.85 frames. ], batch size: 99, lr: 2.54e-03, grad_scale: 32.0 2023-12-23 21:10:06,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-12-23 21:10:06,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1334440.0, ans=0.125 2023-12-23 21:10:32,652 INFO [train.py:886] (3/4) Epoch 43, batch 0, loss[loss=0.0275, audio_tagging_loss=0.0275, over 25000.00 frames. ], tot_loss[loss=0.0275, audio_tagging_loss=0.0275, over 25000.00 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:10:32,653 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 21:10:41,417 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6055, 3.7037, 3.3729, 3.1167], device='cuda:3') 2023-12-23 21:10:53,523 INFO [train.py:917] (3/4) Epoch 43, validation: loss=0.0346, audio_tagging_loss=0.0346, over 3737520.00 frames. 2023-12-23 21:10:53,524 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 21:10:53,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1334480.0, ans=0.0 2023-12-23 21:11:02,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1334480.0, ans=0.125 2023-12-23 21:11:04,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1334546.6666666667, ans=0.2 2023-12-23 21:11:11,322 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.518e+01 3.885e+01 4.049e+01 4.321e+01 9.986e+01, threshold=8.099e+01, percent-clipped=5.0 2023-12-23 21:11:31,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1334680.0, ans=0.125 2023-12-23 21:11:45,495 INFO [train.py:886] (3/4) Epoch 43, batch 50, loss[loss=0.01713, audio_tagging_loss=0.01713, over 25000.00 frames. ], tot_loss[loss=0.01804, audio_tagging_loss=0.01804, over 1121878.92 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:12:11,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1334946.6666666667, ans=0.125 2023-12-23 21:12:17,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1335013.3333333333, ans=0.1 2023-12-23 21:12:20,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2023-12-23 21:12:32,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1335080.0, ans=0.125 2023-12-23 21:12:37,575 INFO [train.py:886] (3/4) Epoch 43, batch 100, loss[loss=0.01374, audio_tagging_loss=0.01374, over 25000.00 frames. ], tot_loss[loss=0.01568, audio_tagging_loss=0.01568, over 1971027.75 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:12:54,978 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.840e+01 4.268e+01 4.587e+01 4.998e+01 5.925e+01, threshold=9.173e+01, percent-clipped=0.0 2023-12-23 21:12:57,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1335280.0, ans=0.125 2023-12-23 21:13:04,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2023-12-23 21:13:23,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1335413.3333333333, ans=0.05 2023-12-23 21:13:24,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-12-23 21:13:28,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1335480.0, ans=0.125 2023-12-23 21:13:28,989 INFO [train.py:886] (3/4) Epoch 43, batch 150, loss[loss=0.01211, audio_tagging_loss=0.01211, over 25000.00 frames. ], tot_loss[loss=0.01444, audio_tagging_loss=0.01444, over 2632458.63 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:13:34,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1335480.0, ans=0.125 2023-12-23 21:13:34,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335480.0, ans=0.1 2023-12-23 21:13:38,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335546.6666666667, ans=0.1 2023-12-23 21:13:38,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-12-23 21:13:44,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-12-23 21:13:57,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-12-23 21:14:00,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1335680.0, ans=0.125 2023-12-23 21:14:01,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1335680.0, ans=10.0 2023-12-23 21:14:22,065 INFO [train.py:886] (3/4) Epoch 43, batch 200, loss[loss=0.01392, audio_tagging_loss=0.01392, over 25000.00 frames. ], tot_loss[loss=0.01358, audio_tagging_loss=0.01358, over 3151762.48 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:14:22,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1335813.3333333333, ans=0.1 2023-12-23 21:14:29,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.57 vs. limit=10.0 2023-12-23 21:14:30,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1335880.0, ans=0.1 2023-12-23 21:14:30,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1335880.0, ans=0.0 2023-12-23 21:14:34,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1335880.0, ans=0.0 2023-12-23 21:14:38,284 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.831e+01 3.990e+01 4.242e+01 5.537e+01, threshold=7.979e+01, percent-clipped=0.0 2023-12-23 21:14:55,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1336013.3333333333, ans=0.0 2023-12-23 21:14:56,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1336013.3333333333, ans=0.125 2023-12-23 21:15:12,778 INFO [train.py:886] (3/4) Epoch 43, batch 250, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 3555122.39 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:15:14,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=8.0 2023-12-23 21:15:15,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1336146.6666666667, ans=0.1 2023-12-23 21:15:18,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1336146.6666666667, ans=0.125 2023-12-23 21:15:28,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1336213.3333333333, ans=0.0 2023-12-23 21:15:33,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1336280.0, ans=0.0 2023-12-23 21:16:05,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1336480.0, ans=22.5 2023-12-23 21:16:06,117 INFO [train.py:886] (3/4) Epoch 43, batch 300, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01267, audio_tagging_loss=0.01267, over 3867349.73 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:16:09,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1336480.0, ans=0.125 2023-12-23 21:16:12,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1336480.0, ans=0.1 2023-12-23 21:16:22,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.757e+01 3.899e+01 4.075e+01 4.664e+01, threshold=7.797e+01, percent-clipped=0.0 2023-12-23 21:16:31,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1336613.3333333333, ans=0.0 2023-12-23 21:16:33,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1336613.3333333333, ans=0.125 2023-12-23 21:16:44,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1336680.0, ans=0.1 2023-12-23 21:16:54,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1336746.6666666667, ans=0.0 2023-12-23 21:16:57,854 INFO [train.py:886] (3/4) Epoch 43, batch 350, loss[loss=0.009532, audio_tagging_loss=0.009532, over 24750.00 frames. ], tot_loss[loss=0.01238, audio_tagging_loss=0.01238, over 4103150.94 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 16.0 2023-12-23 21:17:01,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1336813.3333333333, ans=0.125 2023-12-23 21:17:04,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1336813.3333333333, ans=0.0 2023-12-23 21:17:13,838 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.919e-02 2023-12-23 21:17:29,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1337013.3333333333, ans=0.125 2023-12-23 21:17:45,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-12-23 21:17:49,699 INFO [train.py:886] (3/4) Epoch 43, batch 400, loss[loss=0.01181, audio_tagging_loss=0.01181, over 24750.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 4290104.29 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:18:00,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-12-23 21:18:08,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.775e+01 3.908e+01 4.060e+01 5.626e+01, threshold=7.816e+01, percent-clipped=0.0 2023-12-23 21:18:26,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1337346.6666666667, ans=0.09899494936611666 2023-12-23 21:18:42,387 INFO [train.py:886] (3/4) Epoch 43, batch 450, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01192, audio_tagging_loss=0.01192, over 4438458.46 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:19:31,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2023-12-23 21:19:33,094 INFO [train.py:886] (3/4) Epoch 43, batch 500, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4557269.70 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:19:41,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.73 vs. limit=22.5 2023-12-23 21:19:51,661 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.285e+01 3.753e+01 3.929e+01 4.110e+01 4.615e+01, threshold=7.857e+01, percent-clipped=0.0 2023-12-23 21:19:55,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1337946.6666666667, ans=0.1 2023-12-23 21:19:55,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.52 vs. limit=10.0 2023-12-23 21:19:56,610 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:19:58,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1337946.6666666667, ans=0.2 2023-12-23 21:20:00,483 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:20:24,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2023-12-23 21:20:25,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338146.6666666667, ans=0.1 2023-12-23 21:20:25,818 INFO [train.py:886] (3/4) Epoch 43, batch 550, loss[loss=0.0125, audio_tagging_loss=0.0125, over 24750.00 frames. ], tot_loss[loss=0.01155, audio_tagging_loss=0.01155, over 4645005.59 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:20:30,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2023-12-23 21:20:33,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-12-23 21:20:34,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1338213.3333333333, ans=0.125 2023-12-23 21:20:38,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1338213.3333333333, ans=0.125 2023-12-23 21:21:01,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1338346.6666666667, ans=0.0 2023-12-23 21:21:05,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1338413.3333333333, ans=0.0 2023-12-23 21:21:16,800 INFO [train.py:886] (3/4) Epoch 43, batch 600, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01161, audio_tagging_loss=0.01161, over 4712840.52 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:21:30,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:21:33,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1338546.6666666667, ans=0.125 2023-12-23 21:21:34,496 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.800e+01 3.964e+01 4.144e+01 4.733e+01, threshold=7.928e+01, percent-clipped=0.0 2023-12-23 21:21:38,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1338613.3333333333, ans=0.0 2023-12-23 21:21:43,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-12-23 21:21:47,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1338680.0, ans=0.2 2023-12-23 21:21:55,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1338680.0, ans=10.0 2023-12-23 21:21:56,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338680.0, ans=0.1 2023-12-23 21:21:58,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2023-12-23 21:22:04,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1338746.6666666667, ans=0.0 2023-12-23 21:22:08,607 INFO [train.py:886] (3/4) Epoch 43, batch 650, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01163, audio_tagging_loss=0.01163, over 4763844.29 frames. ], batch size: 99, lr: 2.51e-03, grad_scale: 32.0 2023-12-23 21:22:17,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1338813.3333333333, ans=0.125 2023-12-23 21:22:25,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338880.0, ans=0.1 2023-12-23 21:22:35,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1338946.6666666667, ans=0.0 2023-12-23 21:22:37,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338946.6666666667, ans=0.1 2023-12-23 21:22:37,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1338946.6666666667, ans=0.0 2023-12-23 21:22:50,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-12-23 21:22:51,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=8.0 2023-12-23 21:22:58,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1339080.0, ans=0.125 2023-12-23 21:23:01,669 INFO [train.py:886] (3/4) Epoch 43, batch 700, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4800054.95 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:23:11,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1339213.3333333333, ans=0.125 2023-12-23 21:23:17,910 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.835e+01 3.964e+01 4.110e+01 4.993e+01, threshold=7.927e+01, percent-clipped=0.0 2023-12-23 21:23:35,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1339346.6666666667, ans=0.125 2023-12-23 21:23:41,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1339413.3333333333, ans=0.125 2023-12-23 21:23:48,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2023-12-23 21:23:52,661 INFO [train.py:886] (3/4) Epoch 43, batch 750, loss[loss=0.01178, audio_tagging_loss=0.01178, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4836784.13 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:08,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1339546.6666666667, ans=0.0 2023-12-23 21:24:34,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=12.0 2023-12-23 21:24:40,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2023-12-23 21:24:41,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1339746.6666666667, ans=0.125 2023-12-23 21:24:45,197 INFO [train.py:886] (3/4) Epoch 43, batch 800, loss[loss=0.009336, audio_tagging_loss=0.009336, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4864568.10 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:24:47,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1339813.3333333333, ans=0.125 2023-12-23 21:24:47,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1339813.3333333333, ans=0.125 2023-12-23 21:24:53,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1339880.0, ans=0.05 2023-12-23 21:24:59,461 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:25:02,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2023-12-23 21:25:03,501 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.317e+01 3.756e+01 3.877e+01 4.084e+01 5.332e+01, threshold=7.753e+01, percent-clipped=0.0 2023-12-23 21:25:13,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1339946.6666666667, ans=0.125 2023-12-23 21:25:20,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1340013.3333333333, ans=0.0 2023-12-23 21:25:24,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1340013.3333333333, ans=0.125 2023-12-23 21:25:24,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1340013.3333333333, ans=0.125 2023-12-23 21:25:32,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1340080.0, ans=0.125 2023-12-23 21:25:38,238 INFO [train.py:886] (3/4) Epoch 43, batch 850, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4887922.85 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:25:39,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1340146.6666666667, ans=0.2 2023-12-23 21:25:40,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1340146.6666666667, ans=0.125 2023-12-23 21:25:47,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1340213.3333333333, ans=0.125 2023-12-23 21:26:01,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2023-12-23 21:26:03,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2023-12-23 21:26:08,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1340346.6666666667, ans=0.05 2023-12-23 21:26:21,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1340413.3333333333, ans=0.125 2023-12-23 21:26:29,271 INFO [train.py:886] (3/4) Epoch 43, batch 900, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4898683.05 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:26:36,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1340480.0, ans=0.125 2023-12-23 21:26:45,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1340546.6666666667, ans=0.125 2023-12-23 21:26:47,368 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.377e+01 3.810e+01 3.949e+01 4.133e+01 4.627e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 21:27:04,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1340680.0, ans=0.09899494936611666 2023-12-23 21:27:06,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1340680.0, ans=0.025 2023-12-23 21:27:20,773 INFO [train.py:886] (3/4) Epoch 43, batch 950, loss[loss=0.01048, audio_tagging_loss=0.01048, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4905682.97 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:27:34,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1340880.0, ans=0.125 2023-12-23 21:27:55,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1341013.3333333333, ans=0.5 2023-12-23 21:28:13,298 INFO [train.py:886] (3/4) Epoch 43, batch 1000, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4912841.37 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:28:14,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1341146.6666666667, ans=0.0 2023-12-23 21:28:30,322 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.266e+01 3.775e+01 3.954e+01 4.117e+01 5.148e+01, threshold=7.909e+01, percent-clipped=0.0 2023-12-23 21:28:34,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1341280.0, ans=0.125 2023-12-23 21:28:34,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2023-12-23 21:28:53,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1341413.3333333333, ans=0.125 2023-12-23 21:29:04,833 INFO [train.py:886] (3/4) Epoch 43, batch 1050, loss[loss=0.01234, audio_tagging_loss=0.01234, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4919701.43 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:29:07,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1341480.0, ans=0.0 2023-12-23 21:29:10,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2023-12-23 21:29:13,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-12-23 21:29:15,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1341546.6666666667, ans=0.2 2023-12-23 21:29:23,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1341546.6666666667, ans=0.125 2023-12-23 21:29:26,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1341613.3333333333, ans=0.125 2023-12-23 21:29:47,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1341746.6666666667, ans=0.0 2023-12-23 21:29:57,268 INFO [train.py:886] (3/4) Epoch 43, batch 1100, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4932761.99 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:30:02,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1341813.3333333333, ans=0.125 2023-12-23 21:30:10,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1341880.0, ans=0.0 2023-12-23 21:30:11,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1341880.0, ans=0.125 2023-12-23 21:30:14,085 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.243e+01 3.738e+01 3.893e+01 4.064e+01 5.194e+01, threshold=7.786e+01, percent-clipped=0.0 2023-12-23 21:30:18,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1341946.6666666667, ans=0.125 2023-12-23 21:30:18,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1341946.6666666667, ans=0.125 2023-12-23 21:30:25,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=12.0 2023-12-23 21:30:26,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1341946.6666666667, ans=0.125 2023-12-23 21:30:31,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1342013.3333333333, ans=0.1 2023-12-23 21:30:42,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1342080.0, ans=0.125 2023-12-23 21:30:43,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1342080.0, ans=0.2 2023-12-23 21:30:44,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1342080.0, ans=0.125 2023-12-23 21:30:48,487 INFO [train.py:886] (3/4) Epoch 43, batch 1150, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4935831.47 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:31:33,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1342413.3333333333, ans=10.0 2023-12-23 21:31:37,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1342413.3333333333, ans=0.07 2023-12-23 21:31:40,783 INFO [train.py:886] (3/4) Epoch 43, batch 1200, loss[loss=0.0126, audio_tagging_loss=0.0126, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4946135.05 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:31:45,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1342480.0, ans=0.125 2023-12-23 21:31:59,047 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.434e+01 3.771e+01 3.906e+01 4.055e+01 4.735e+01, threshold=7.811e+01, percent-clipped=0.0 2023-12-23 21:32:05,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1342613.3333333333, ans=0.1 2023-12-23 21:32:14,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-12-23 21:32:30,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1342746.6666666667, ans=0.125 2023-12-23 21:32:33,203 INFO [train.py:886] (3/4) Epoch 43, batch 1250, loss[loss=0.01345, audio_tagging_loss=0.01345, over 25000.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4944201.60 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:33:12,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1343080.0, ans=0.0 2023-12-23 21:33:22,785 INFO [train.py:886] (3/4) Epoch 43, batch 1300, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4937685.39 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:33:23,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1343146.6666666667, ans=0.125 2023-12-23 21:33:24,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1343146.6666666667, ans=0.125 2023-12-23 21:33:26,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-12-23 21:33:29,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1343146.6666666667, ans=0.125 2023-12-23 21:33:41,675 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.269e+01 3.738e+01 3.919e+01 4.127e+01 4.801e+01, threshold=7.839e+01, percent-clipped=0.0 2023-12-23 21:33:49,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.57 vs. limit=10.0 2023-12-23 21:34:03,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1343346.6666666667, ans=0.5 2023-12-23 21:34:16,074 INFO [train.py:886] (3/4) Epoch 43, batch 1350, loss[loss=0.01339, audio_tagging_loss=0.01339, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4934480.33 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:34:18,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-12-23 21:34:33,195 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:34:41,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1343613.3333333333, ans=0.0 2023-12-23 21:34:53,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1343680.0, ans=0.1 2023-12-23 21:34:57,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1343746.6666666667, ans=0.125 2023-12-23 21:35:00,990 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:35:03,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2023-12-23 21:35:07,989 INFO [train.py:886] (3/4) Epoch 43, batch 1400, loss[loss=0.009547, audio_tagging_loss=0.009547, over 25000.00 frames. ], tot_loss[loss=0.01138, audio_tagging_loss=0.01138, over 4939462.48 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:35:23,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1343880.0, ans=0.0 2023-12-23 21:35:25,464 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.254e+01 3.705e+01 3.872e+01 4.110e+01 5.093e+01, threshold=7.744e+01, percent-clipped=0.0 2023-12-23 21:35:35,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1343946.6666666667, ans=0.2 2023-12-23 21:35:39,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1344013.3333333333, ans=0.1 2023-12-23 21:36:00,017 INFO [train.py:886] (3/4) Epoch 43, batch 1450, loss[loss=0.01419, audio_tagging_loss=0.01419, over 25000.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4946757.50 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:36:01,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1344146.6666666667, ans=0.125 2023-12-23 21:36:13,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1344213.3333333333, ans=0.0 2023-12-23 21:36:22,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1344280.0, ans=0.125 2023-12-23 21:36:26,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1344280.0, ans=0.0 2023-12-23 21:36:50,628 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:36:53,274 INFO [train.py:886] (3/4) Epoch 43, batch 1500, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4952519.82 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:36:54,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2023-12-23 21:36:55,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1344480.0, ans=0.1 2023-12-23 21:37:00,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1344480.0, ans=0.125 2023-12-23 21:37:09,393 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.371e+01 3.779e+01 3.922e+01 4.118e+01 4.489e+01, threshold=7.843e+01, percent-clipped=0.0 2023-12-23 21:37:38,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1344746.6666666667, ans=22.5 2023-12-23 21:37:42,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-12-23 21:37:42,876 INFO [train.py:886] (3/4) Epoch 43, batch 1550, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4939097.14 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:37:47,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=15.0 2023-12-23 21:37:55,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-12-23 21:38:00,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1344880.0, ans=0.125 2023-12-23 21:38:02,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1344880.0, ans=0.0 2023-12-23 21:38:18,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1345013.3333333333, ans=0.125 2023-12-23 21:38:21,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1345013.3333333333, ans=0.025 2023-12-23 21:38:36,334 INFO [train.py:886] (3/4) Epoch 43, batch 1600, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4931874.67 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:38:45,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1345213.3333333333, ans=10.0 2023-12-23 21:38:52,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1345213.3333333333, ans=0.125 2023-12-23 21:38:53,035 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.814e+01 3.969e+01 4.149e+01 4.804e+01, threshold=7.938e+01, percent-clipped=0.0 2023-12-23 21:38:53,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1345213.3333333333, ans=0.125 2023-12-23 21:39:28,157 INFO [train.py:886] (3/4) Epoch 43, batch 1650, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4937618.46 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:39:30,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1345480.0, ans=0.2 2023-12-23 21:40:19,275 INFO [train.py:886] (3/4) Epoch 43, batch 1700, loss[loss=0.008515, audio_tagging_loss=0.008515, over 24107.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4942686.71 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:40:38,053 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.392e+01 3.780e+01 3.982e+01 4.194e+01 5.083e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:40:43,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1345946.6666666667, ans=0.2 2023-12-23 21:41:12,544 INFO [train.py:886] (3/4) Epoch 43, batch 1750, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4950619.12 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:41:16,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1346146.6666666667, ans=0.125 2023-12-23 21:41:22,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-12-23 21:41:24,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346213.3333333333, ans=0.1 2023-12-23 21:41:33,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1346280.0, ans=0.125 2023-12-23 21:41:35,339 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:41:40,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2023-12-23 21:41:48,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1346346.6666666667, ans=0.125 2023-12-23 21:41:51,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-12-23 21:41:53,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1346413.3333333333, ans=0.2 2023-12-23 21:41:58,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1346413.3333333333, ans=0.125 2023-12-23 21:42:02,805 INFO [train.py:886] (3/4) Epoch 43, batch 1800, loss[loss=0.009398, audio_tagging_loss=0.009398, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4950717.29 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:42:21,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.08 vs. limit=12.0 2023-12-23 21:42:21,658 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 3.726e+01 3.927e+01 4.030e+01 4.598e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 21:42:24,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-23 21:42:27,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1346613.3333333333, ans=0.5 2023-12-23 21:42:31,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1346613.3333333333, ans=0.1 2023-12-23 21:42:56,174 INFO [train.py:886] (3/4) Epoch 43, batch 1850, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4950703.87 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:42:58,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1346813.3333333333, ans=0.2 2023-12-23 21:43:04,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1346880.0, ans=0.0 2023-12-23 21:43:07,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-12-23 21:43:07,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.20 vs. limit=10.0 2023-12-23 21:43:08,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1346880.0, ans=0.125 2023-12-23 21:43:11,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1346880.0, ans=0.0 2023-12-23 21:43:11,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1346880.0, ans=0.0 2023-12-23 21:43:32,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-12-23 21:43:48,404 INFO [train.py:886] (3/4) Epoch 43, batch 1900, loss[loss=0.01017, audio_tagging_loss=0.01017, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4943235.46 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:43:52,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1347146.6666666667, ans=0.0 2023-12-23 21:43:54,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1347146.6666666667, ans=0.125 2023-12-23 21:44:05,352 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.418e+01 3.802e+01 3.982e+01 4.158e+01 4.820e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 21:44:06,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1347213.3333333333, ans=0.2 2023-12-23 21:44:13,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1347280.0, ans=0.125 2023-12-23 21:44:34,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1347413.3333333333, ans=0.0 2023-12-23 21:44:39,938 INFO [train.py:886] (3/4) Epoch 43, batch 1950, loss[loss=0.01105, audio_tagging_loss=0.01105, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4945169.34 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:44:46,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1347480.0, ans=0.0 2023-12-23 21:44:58,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1347546.6666666667, ans=0.125 2023-12-23 21:45:01,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1347613.3333333333, ans=0.125 2023-12-23 21:45:03,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1347613.3333333333, ans=0.125 2023-12-23 21:45:10,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1347680.0, ans=0.125 2023-12-23 21:45:16,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1347680.0, ans=0.125 2023-12-23 21:45:19,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1347680.0, ans=0.125 2023-12-23 21:45:27,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-12-23 21:45:32,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1347813.3333333333, ans=0.125 2023-12-23 21:45:32,963 INFO [train.py:886] (3/4) Epoch 43, batch 2000, loss[loss=0.0118, audio_tagging_loss=0.0118, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4939055.29 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 32.0 2023-12-23 21:45:50,013 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.411e+01 3.774e+01 3.907e+01 4.123e+01 6.126e+01, threshold=7.815e+01, percent-clipped=0.0 2023-12-23 21:46:04,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.43 vs. limit=22.5 2023-12-23 21:46:13,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:46:17,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2023-12-23 21:46:25,177 INFO [train.py:886] (3/4) Epoch 43, batch 2050, loss[loss=0.01064, audio_tagging_loss=0.01064, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4942296.31 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:46:43,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1348213.3333333333, ans=0.02 2023-12-23 21:46:43,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2023-12-23 21:46:55,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1348346.6666666667, ans=0.0 2023-12-23 21:47:15,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1348413.3333333333, ans=0.0 2023-12-23 21:47:17,043 INFO [train.py:886] (3/4) Epoch 43, batch 2100, loss[loss=0.01041, audio_tagging_loss=0.01041, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4948407.41 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:47:27,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-12-23 21:47:36,272 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.711e+01 3.880e+01 4.032e+01 4.676e+01, threshold=7.761e+01, percent-clipped=0.0 2023-12-23 21:47:38,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1348613.3333333333, ans=0.0 2023-12-23 21:47:43,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1348613.3333333333, ans=0.1 2023-12-23 21:47:54,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1348680.0, ans=0.1 2023-12-23 21:48:10,530 INFO [train.py:886] (3/4) Epoch 43, batch 2150, loss[loss=0.01473, audio_tagging_loss=0.01473, over 24750.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4948145.20 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:48:21,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1348880.0, ans=0.2 2023-12-23 21:48:29,698 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:48:45,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1349013.3333333333, ans=0.0 2023-12-23 21:48:47,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:48:51,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1349080.0, ans=0.0 2023-12-23 21:48:52,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-12-23 21:48:52,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1349080.0, ans=0.95 2023-12-23 21:49:01,978 INFO [train.py:886] (3/4) Epoch 43, batch 2200, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4944521.84 frames. ], batch size: 99, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:49:03,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1349146.6666666667, ans=0.0 2023-12-23 21:49:20,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.810e+01 3.972e+01 4.173e+01 4.722e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:49:21,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1349213.3333333333, ans=0.125 2023-12-23 21:49:21,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1349213.3333333333, ans=0.125 2023-12-23 21:49:29,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-12-23 21:49:54,811 INFO [train.py:886] (3/4) Epoch 43, batch 2250, loss[loss=0.009345, audio_tagging_loss=0.009345, over 23987.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4944578.20 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 64.0 2023-12-23 21:49:56,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-12-23 21:50:00,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1349480.0, ans=0.125 2023-12-23 21:50:01,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1349480.0, ans=0.125 2023-12-23 21:50:14,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1349546.6666666667, ans=0.2 2023-12-23 21:50:20,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1349613.3333333333, ans=0.0 2023-12-23 21:50:28,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1349680.0, ans=0.035 2023-12-23 21:50:42,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=12.0 2023-12-23 21:50:48,222 INFO [train.py:886] (3/4) Epoch 43, batch 2300, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4945047.01 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:50:59,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1349880.0, ans=0.0 2023-12-23 21:51:04,570 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.719e+01 3.860e+01 4.088e+01 4.787e+01, threshold=7.721e+01, percent-clipped=0.0 2023-12-23 21:51:07,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1349946.6666666667, ans=0.0 2023-12-23 21:51:23,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1350013.3333333333, ans=0.0 2023-12-23 21:51:38,599 INFO [train.py:886] (3/4) Epoch 43, batch 2350, loss[loss=0.0108, audio_tagging_loss=0.0108, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4945920.17 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:52:03,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1350280.0, ans=0.0 2023-12-23 21:52:31,275 INFO [train.py:886] (3/4) Epoch 43, batch 2400, loss[loss=0.009371, audio_tagging_loss=0.009371, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4951829.32 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:52:32,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1350480.0, ans=0.5 2023-12-23 21:52:38,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-12-23 21:52:47,480 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.759e+01 3.931e+01 4.096e+01 4.502e+01, threshold=7.861e+01, percent-clipped=0.0 2023-12-23 21:53:08,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-12-23 21:53:12,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1350746.6666666667, ans=0.125 2023-12-23 21:53:12,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1350746.6666666667, ans=0.125 2023-12-23 21:53:19,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1350746.6666666667, ans=0.125 2023-12-23 21:53:22,057 INFO [train.py:886] (3/4) Epoch 43, batch 2450, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4954254.79 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:53:42,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1350880.0, ans=0.125 2023-12-23 21:54:02,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1351013.3333333333, ans=0.0 2023-12-23 21:54:14,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351146.6666666667, ans=0.1 2023-12-23 21:54:14,890 INFO [train.py:886] (3/4) Epoch 43, batch 2500, loss[loss=0.01237, audio_tagging_loss=0.01237, over 24750.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4953560.07 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:54:26,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1351213.3333333333, ans=0.0 2023-12-23 21:54:29,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1351213.3333333333, ans=0.125 2023-12-23 21:54:32,405 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.815e+01 4.045e+01 4.230e+01 4.864e+01, threshold=8.091e+01, percent-clipped=0.0 2023-12-23 21:54:39,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1351280.0, ans=0.0 2023-12-23 21:54:50,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1351346.6666666667, ans=0.125 2023-12-23 21:54:58,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-12-23 21:55:03,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1351413.3333333333, ans=0.125 2023-12-23 21:55:07,174 INFO [train.py:886] (3/4) Epoch 43, batch 2550, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4952193.71 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:55:15,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1351480.0, ans=0.1 2023-12-23 21:55:21,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1351546.6666666667, ans=0.125 2023-12-23 21:55:28,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1351613.3333333333, ans=0.125 2023-12-23 21:55:32,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.36 vs. limit=5.0 2023-12-23 21:55:33,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1351613.3333333333, ans=0.125 2023-12-23 21:55:50,855 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:55:53,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351746.6666666667, ans=0.1 2023-12-23 21:55:54,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1351746.6666666667, ans=0.125 2023-12-23 21:55:57,343 INFO [train.py:886] (3/4) Epoch 43, batch 2600, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4956218.46 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:56:16,371 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.494e+01 3.793e+01 3.971e+01 4.202e+01 4.613e+01, threshold=7.943e+01, percent-clipped=0.0 2023-12-23 21:56:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1351946.6666666667, ans=0.125 2023-12-23 21:56:21,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1351946.6666666667, ans=0.0 2023-12-23 21:56:27,460 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 21:56:31,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-12-23 21:56:40,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1352080.0, ans=0.125 2023-12-23 21:56:45,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1352080.0, ans=0.0 2023-12-23 21:56:50,196 INFO [train.py:886] (3/4) Epoch 43, batch 2650, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4960562.04 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:56:52,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1352146.6666666667, ans=0.1 2023-12-23 21:56:52,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1352146.6666666667, ans=0.125 2023-12-23 21:56:53,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-12-23 21:57:02,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1352213.3333333333, ans=0.0 2023-12-23 21:57:18,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1352280.0, ans=0.125 2023-12-23 21:57:24,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1352346.6666666667, ans=0.125 2023-12-23 21:57:26,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1352346.6666666667, ans=0.0 2023-12-23 21:57:30,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1352413.3333333333, ans=0.125 2023-12-23 21:57:30,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1352413.3333333333, ans=0.07 2023-12-23 21:57:31,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1352413.3333333333, ans=15.0 2023-12-23 21:57:41,542 INFO [train.py:886] (3/4) Epoch 43, batch 2700, loss[loss=0.009167, audio_tagging_loss=0.009167, over 24072.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4961901.21 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:57:58,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1352546.6666666667, ans=0.125 2023-12-23 21:57:59,802 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.334e+01 3.701e+01 3.871e+01 4.080e+01 4.871e+01, threshold=7.742e+01, percent-clipped=0.0 2023-12-23 21:58:02,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-12-23 21:58:12,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1352680.0, ans=10.0 2023-12-23 21:58:15,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1352680.0, ans=0.125 2023-12-23 21:58:34,119 INFO [train.py:886] (3/4) Epoch 43, batch 2750, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4966661.79 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:58:56,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1352946.6666666667, ans=0.2 2023-12-23 21:58:59,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1352946.6666666667, ans=0.0 2023-12-23 21:59:03,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=12.0 2023-12-23 21:59:15,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1353080.0, ans=0.1 2023-12-23 21:59:19,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1353080.0, ans=0.125 2023-12-23 21:59:23,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1353080.0, ans=0.125 2023-12-23 21:59:26,444 INFO [train.py:886] (3/4) Epoch 43, batch 2800, loss[loss=0.01302, audio_tagging_loss=0.01302, over 24953.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4963992.91 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 21:59:31,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1353146.6666666667, ans=0.125 2023-12-23 21:59:38,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2023-12-23 21:59:43,152 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+01 3.835e+01 3.982e+01 4.160e+01 5.061e+01, threshold=7.963e+01, percent-clipped=0.0 2023-12-23 21:59:43,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1353213.3333333333, ans=0.125 2023-12-23 21:59:57,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2023-12-23 22:00:07,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1353413.3333333333, ans=0.0 2023-12-23 22:00:10,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1353413.3333333333, ans=0.2 2023-12-23 22:00:11,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1353413.3333333333, ans=0.2 2023-12-23 22:00:18,019 INFO [train.py:886] (3/4) Epoch 43, batch 2850, loss[loss=0.009414, audio_tagging_loss=0.009414, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4956340.84 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:00:30,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-12-23 22:00:46,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-12-23 22:00:49,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1353680.0, ans=0.2 2023-12-23 22:01:10,321 INFO [train.py:886] (3/4) Epoch 43, batch 2900, loss[loss=0.009879, audio_tagging_loss=0.009879, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4952119.35 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:01:22,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1353880.0, ans=0.0 2023-12-23 22:01:28,116 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.361e+01 3.812e+01 3.915e+01 4.095e+01 4.854e+01, threshold=7.829e+01, percent-clipped=0.0 2023-12-23 22:01:32,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1353946.6666666667, ans=0.125 2023-12-23 22:01:40,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1354013.3333333333, ans=0.2 2023-12-23 22:01:44,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1354013.3333333333, ans=0.2 2023-12-23 22:01:50,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1354080.0, ans=0.125 2023-12-23 22:02:02,228 INFO [train.py:886] (3/4) Epoch 43, batch 2950, loss[loss=0.009589, audio_tagging_loss=0.009589, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4950620.97 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:06,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1354146.6666666667, ans=0.125 2023-12-23 22:02:06,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1354146.6666666667, ans=0.125 2023-12-23 22:02:19,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-12-23 22:02:22,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1354280.0, ans=0.0 2023-12-23 22:02:22,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1354280.0, ans=0.0 2023-12-23 22:02:42,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1354346.6666666667, ans=0.125 2023-12-23 22:02:42,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1354413.3333333333, ans=0.125 2023-12-23 22:02:43,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1354413.3333333333, ans=0.015 2023-12-23 22:02:43,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1354413.3333333333, ans=0.2 2023-12-23 22:02:50,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-12-23 22:02:53,849 INFO [train.py:886] (3/4) Epoch 43, batch 3000, loss[loss=0.008752, audio_tagging_loss=0.008752, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4952174.03 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:02:53,850 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 22:03:03,735 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7361, 5.9156, 5.3015, 5.6654], device='cuda:3') 2023-12-23 22:03:04,194 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2742, 2.0694, 3.3253, 2.5476, 3.8946, 2.7320, 1.9671, 2.2251], device='cuda:3') 2023-12-23 22:03:15,291 INFO [train.py:917] (3/4) Epoch 43, validation: loss=0.03559, audio_tagging_loss=0.03559, over 3737520.00 frames. 2023-12-23 22:03:15,292 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 22:03:26,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1354546.6666666667, ans=0.0 2023-12-23 22:03:31,978 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.311e+01 3.762e+01 3.904e+01 4.055e+01 4.746e+01, threshold=7.807e+01, percent-clipped=0.0 2023-12-23 22:03:53,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2023-12-23 22:04:01,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1354746.6666666667, ans=0.0 2023-12-23 22:04:01,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1354746.6666666667, ans=0.2 2023-12-23 22:04:05,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-12-23 22:04:07,152 INFO [train.py:886] (3/4) Epoch 43, batch 3050, loss[loss=0.01329, audio_tagging_loss=0.01329, over 25000.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4955463.90 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:04:13,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1354813.3333333333, ans=0.035 2023-12-23 22:04:14,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2023-12-23 22:04:14,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1354813.3333333333, ans=0.125 2023-12-23 22:04:20,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1354880.0, ans=0.0 2023-12-23 22:04:26,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1354946.6666666667, ans=0.1 2023-12-23 22:04:26,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1354946.6666666667, ans=0.1 2023-12-23 22:04:36,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1355013.3333333333, ans=0.0 2023-12-23 22:04:41,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1355013.3333333333, ans=0.125 2023-12-23 22:04:58,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1355146.6666666667, ans=0.0 2023-12-23 22:04:58,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1355146.6666666667, ans=0.0 2023-12-23 22:04:58,883 INFO [train.py:886] (3/4) Epoch 43, batch 3100, loss[loss=0.009612, audio_tagging_loss=0.009612, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4951715.97 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:05:16,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2023-12-23 22:05:17,029 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 3.827e+01 3.980e+01 4.191e+01 5.132e+01, threshold=7.960e+01, percent-clipped=0.0 2023-12-23 22:05:21,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1355280.0, ans=0.125 2023-12-23 22:05:23,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2023-12-23 22:05:33,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1355346.6666666667, ans=0.09899494936611666 2023-12-23 22:05:51,040 INFO [train.py:886] (3/4) Epoch 43, batch 3150, loss[loss=0.013, audio_tagging_loss=0.013, over 22531.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4947504.88 frames. ], batch size: 107, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:05:57,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-12-23 22:06:24,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1355680.0, ans=0.0 2023-12-23 22:06:35,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1355746.6666666667, ans=0.125 2023-12-23 22:06:42,743 INFO [train.py:886] (3/4) Epoch 43, batch 3200, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4946709.11 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:07:00,367 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.795e+01 4.020e+01 4.191e+01 4.610e+01, threshold=8.041e+01, percent-clipped=0.0 2023-12-23 22:07:04,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1355946.6666666667, ans=0.125 2023-12-23 22:07:04,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-12-23 22:07:07,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1355946.6666666667, ans=0.0 2023-12-23 22:07:08,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1355946.6666666667, ans=0.125 2023-12-23 22:07:24,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1356080.0, ans=10.0 2023-12-23 22:07:31,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-12-23 22:07:34,562 INFO [train.py:886] (3/4) Epoch 43, batch 3250, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4948406.03 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:08:02,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1356280.0, ans=0.2 2023-12-23 22:08:10,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1356346.6666666667, ans=0.125 2023-12-23 22:08:17,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1356413.3333333333, ans=0.125 2023-12-23 22:08:20,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-12-23 22:08:27,701 INFO [train.py:886] (3/4) Epoch 43, batch 3300, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4956171.90 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:08:30,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356480.0, ans=0.1 2023-12-23 22:08:43,757 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.382e+01 3.775e+01 3.931e+01 4.135e+01 5.251e+01, threshold=7.863e+01, percent-clipped=0.0 2023-12-23 22:08:57,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2023-12-23 22:09:00,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-12-23 22:09:01,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1356680.0, ans=0.1 2023-12-23 22:09:03,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1356680.0, ans=0.0 2023-12-23 22:09:12,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-12-23 22:09:15,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1356746.6666666667, ans=0.09899494936611666 2023-12-23 22:09:17,299 INFO [train.py:886] (3/4) Epoch 43, batch 3350, loss[loss=0.01219, audio_tagging_loss=0.01219, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4955971.65 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:09:35,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1356880.0, ans=0.125 2023-12-23 22:09:36,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1356880.0, ans=0.125 2023-12-23 22:09:41,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1356946.6666666667, ans=0.125 2023-12-23 22:09:47,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-12-23 22:10:01,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1357080.0, ans=0.0 2023-12-23 22:10:05,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1357080.0, ans=0.1 2023-12-23 22:10:08,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1357080.0, ans=0.1 2023-12-23 22:10:10,552 INFO [train.py:886] (3/4) Epoch 43, batch 3400, loss[loss=0.009472, audio_tagging_loss=0.009472, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4960763.68 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:10:18,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1357146.6666666667, ans=0.1 2023-12-23 22:10:27,283 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.821e+01 3.927e+01 4.185e+01 4.649e+01, threshold=7.854e+01, percent-clipped=0.0 2023-12-23 22:10:38,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1357280.0, ans=0.125 2023-12-23 22:10:47,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1357346.6666666667, ans=0.0 2023-12-23 22:11:02,480 INFO [train.py:886] (3/4) Epoch 43, batch 3450, loss[loss=0.01286, audio_tagging_loss=0.01286, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4959049.55 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 64.0 2023-12-23 22:11:34,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1357680.0, ans=0.1 2023-12-23 22:11:41,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1357680.0, ans=0.0 2023-12-23 22:11:43,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1357746.6666666667, ans=0.2 2023-12-23 22:11:54,027 INFO [train.py:886] (3/4) Epoch 43, batch 3500, loss[loss=0.009467, audio_tagging_loss=0.009467, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4954967.53 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:12:14,013 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.870e+01 4.030e+01 4.254e+01 6.630e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 22:12:18,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1357946.6666666667, ans=0.0 2023-12-23 22:12:47,453 INFO [train.py:886] (3/4) Epoch 43, batch 3550, loss[loss=0.009481, audio_tagging_loss=0.009481, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4951364.16 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:12:50,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2023-12-23 22:13:03,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1358213.3333333333, ans=0.2 2023-12-23 22:13:19,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1358346.6666666667, ans=0.125 2023-12-23 22:13:24,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1358346.6666666667, ans=0.125 2023-12-23 22:13:26,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1358346.6666666667, ans=0.2 2023-12-23 22:13:26,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1358346.6666666667, ans=0.95 2023-12-23 22:13:34,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1358413.3333333333, ans=0.07 2023-12-23 22:13:37,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-12-23 22:13:38,453 INFO [train.py:886] (3/4) Epoch 43, batch 3600, loss[loss=0.008442, audio_tagging_loss=0.008442, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4953018.96 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:13:45,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1358480.0, ans=0.5 2023-12-23 22:13:47,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1358480.0, ans=0.0 2023-12-23 22:13:55,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1358546.6666666667, ans=0.0 2023-12-23 22:13:57,054 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.362e+01 3.773e+01 3.900e+01 4.130e+01 5.213e+01, threshold=7.800e+01, percent-clipped=0.0 2023-12-23 22:14:05,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1358613.3333333333, ans=0.0 2023-12-23 22:14:30,409 INFO [train.py:886] (3/4) Epoch 43, batch 3650, loss[loss=0.01089, audio_tagging_loss=0.01089, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4955999.59 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:14:33,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1358813.3333333333, ans=0.2 2023-12-23 22:14:52,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1358946.6666666667, ans=0.95 2023-12-23 22:14:54,730 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:15:04,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1359013.3333333333, ans=0.015 2023-12-23 22:15:22,572 INFO [train.py:886] (3/4) Epoch 43, batch 3700, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4960730.51 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:15:29,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1359146.6666666667, ans=0.125 2023-12-23 22:15:31,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1359146.6666666667, ans=0.125 2023-12-23 22:15:36,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1359213.3333333333, ans=0.0 2023-12-23 22:15:41,169 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.800e+01 4.029e+01 4.222e+01 4.954e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 22:15:51,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1359280.0, ans=0.2 2023-12-23 22:15:52,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-12-23 22:15:56,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1359346.6666666667, ans=0.125 2023-12-23 22:16:09,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1359413.3333333333, ans=0.0 2023-12-23 22:16:14,313 INFO [train.py:886] (3/4) Epoch 43, batch 3750, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4953201.08 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:16:20,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1359480.0, ans=0.07 2023-12-23 22:16:21,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1359480.0, ans=0.2 2023-12-23 22:16:29,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1359546.6666666667, ans=0.1 2023-12-23 22:17:07,199 INFO [train.py:886] (3/4) Epoch 43, batch 3800, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4945452.45 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:17:07,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1359813.3333333333, ans=0.125 2023-12-23 22:17:12,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1359813.3333333333, ans=0.125 2023-12-23 22:17:14,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2023-12-23 22:17:21,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1359880.0, ans=0.0 2023-12-23 22:17:24,807 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.435e+01 3.811e+01 3.969e+01 4.137e+01 5.499e+01, threshold=7.937e+01, percent-clipped=0.0 2023-12-23 22:17:32,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1359946.6666666667, ans=0.1 2023-12-23 22:17:59,943 INFO [train.py:886] (3/4) Epoch 43, batch 3850, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4948293.40 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:18:01,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1360146.6666666667, ans=0.0 2023-12-23 22:18:07,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1360146.6666666667, ans=0.0 2023-12-23 22:18:08,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1360146.6666666667, ans=10.0 2023-12-23 22:18:17,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1360213.3333333333, ans=0.125 2023-12-23 22:18:25,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1360280.0, ans=0.125 2023-12-23 22:18:27,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-23 22:18:31,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1360346.6666666667, ans=0.1 2023-12-23 22:18:52,589 INFO [train.py:886] (3/4) Epoch 43, batch 3900, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4946460.59 frames. ], batch size: 99, lr: 2.49e-03, grad_scale: 32.0 2023-12-23 22:19:11,866 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.297e+01 3.810e+01 3.949e+01 4.143e+01 4.595e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-23 22:19:40,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1360746.6666666667, ans=0.2 2023-12-23 22:19:45,216 INFO [train.py:886] (3/4) Epoch 43, batch 3950, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4953555.07 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:19:48,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1360813.3333333333, ans=0.125 2023-12-23 22:19:54,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1360880.0, ans=0.0 2023-12-23 22:19:55,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1360880.0, ans=0.0 2023-12-23 22:20:08,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1360946.6666666667, ans=0.125 2023-12-23 22:20:36,400 INFO [train.py:886] (3/4) Epoch 43, batch 4000, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4956499.31 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:20:55,669 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.315e+01 3.773e+01 3.926e+01 4.187e+01 4.858e+01, threshold=7.853e+01, percent-clipped=0.0 2023-12-23 22:20:55,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1361213.3333333333, ans=0.125 2023-12-23 22:20:58,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1361280.0, ans=0.125 2023-12-23 22:21:08,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1361346.6666666667, ans=0.2 2023-12-23 22:21:10,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1361346.6666666667, ans=0.125 2023-12-23 22:21:15,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2023-12-23 22:21:28,996 INFO [train.py:886] (3/4) Epoch 43, batch 4050, loss[loss=0.01188, audio_tagging_loss=0.01188, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4954767.74 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:21:29,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1361480.0, ans=0.04949747468305833 2023-12-23 22:21:36,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-12-23 22:21:37,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1361480.0, ans=0.1 2023-12-23 22:21:39,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1361546.6666666667, ans=0.125 2023-12-23 22:21:47,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1361546.6666666667, ans=0.2 2023-12-23 22:22:00,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-12-23 22:22:02,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=15.0 2023-12-23 22:22:03,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1361680.0, ans=0.125 2023-12-23 22:22:05,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1361680.0, ans=0.125 2023-12-23 22:22:07,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1361680.0, ans=0.2 2023-12-23 22:22:21,377 INFO [train.py:886] (3/4) Epoch 43, batch 4100, loss[loss=0.01294, audio_tagging_loss=0.01294, over 24750.00 frames. ], tot_loss[loss=0.01127, audio_tagging_loss=0.01127, over 4948167.65 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:22:22,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-12-23 22:22:23,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1361813.3333333333, ans=0.125 2023-12-23 22:22:23,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1361813.3333333333, ans=0.125 2023-12-23 22:22:26,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-23 22:22:26,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1361813.3333333333, ans=0.125 2023-12-23 22:22:29,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1361813.3333333333, ans=0.1 2023-12-23 22:22:34,498 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:22:39,749 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.848e+01 3.982e+01 4.277e+01 4.961e+01, threshold=7.964e+01, percent-clipped=0.0 2023-12-23 22:23:04,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-12-23 22:23:12,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1362146.6666666667, ans=0.2 2023-12-23 22:23:12,804 INFO [train.py:886] (3/4) Epoch 43, batch 4150, loss[loss=0.01141, audio_tagging_loss=0.01141, over 25000.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4950278.93 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:23:18,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1362146.6666666667, ans=0.125 2023-12-23 22:23:32,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1362280.0, ans=0.125 2023-12-23 22:23:44,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1362346.6666666667, ans=0.1 2023-12-23 22:23:48,099 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:23:54,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1362413.3333333333, ans=0.0 2023-12-23 22:24:05,212 INFO [train.py:886] (3/4) Epoch 43, batch 4200, loss[loss=0.009963, audio_tagging_loss=0.009963, over 25000.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4950103.66 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:24:05,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1362480.0, ans=0.0 2023-12-23 22:24:07,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1362480.0, ans=0.0 2023-12-23 22:24:23,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.761e+01 3.920e+01 4.090e+01 4.676e+01, threshold=7.840e+01, percent-clipped=0.0 2023-12-23 22:24:33,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1362613.3333333333, ans=0.1 2023-12-23 22:24:34,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2023-12-23 22:24:44,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1362680.0, ans=0.2 2023-12-23 22:24:45,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2023-12-23 22:24:57,389 INFO [train.py:886] (3/4) Epoch 43, batch 4250, loss[loss=0.01257, audio_tagging_loss=0.01257, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4947005.00 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:25:04,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1362813.3333333333, ans=0.125 2023-12-23 22:25:06,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362813.3333333333, ans=0.1 2023-12-23 22:25:10,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-12-23 22:25:12,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1362880.0, ans=0.0 2023-12-23 22:25:49,272 INFO [train.py:886] (3/4) Epoch 43, batch 4300, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4953852.05 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:25:52,303 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:25:57,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1363146.6666666667, ans=0.125 2023-12-23 22:25:58,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1363213.3333333333, ans=0.125 2023-12-23 22:26:05,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1363213.3333333333, ans=0.125 2023-12-23 22:26:08,482 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.750e+01 3.975e+01 4.135e+01 4.734e+01, threshold=7.950e+01, percent-clipped=0.0 2023-12-23 22:26:29,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1363413.3333333333, ans=0.125 2023-12-23 22:26:30,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1363413.3333333333, ans=0.0 2023-12-23 22:26:36,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1363413.3333333333, ans=0.125 2023-12-23 22:26:37,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1363413.3333333333, ans=0.0 2023-12-23 22:26:41,276 INFO [train.py:886] (3/4) Epoch 43, batch 4350, loss[loss=0.008711, audio_tagging_loss=0.008711, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4954476.83 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:26:42,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1363480.0, ans=0.2 2023-12-23 22:26:43,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1363480.0, ans=0.125 2023-12-23 22:27:04,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=10.0 2023-12-23 22:27:17,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1363680.0, ans=0.125 2023-12-23 22:27:29,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1363746.6666666667, ans=0.015 2023-12-23 22:27:32,608 INFO [train.py:886] (3/4) Epoch 43, batch 4400, loss[loss=0.01176, audio_tagging_loss=0.01176, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4948800.01 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:27:52,464 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.368e+01 3.866e+01 4.022e+01 4.177e+01 4.881e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-23 22:27:57,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1363946.6666666667, ans=0.125 2023-12-23 22:28:01,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1363946.6666666667, ans=0.0 2023-12-23 22:28:11,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1364013.3333333333, ans=0.125 2023-12-23 22:28:19,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-12-23 22:28:24,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-12-23 22:28:25,657 INFO [train.py:886] (3/4) Epoch 43, batch 4450, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4944128.20 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:28:25,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1364146.6666666667, ans=0.05 2023-12-23 22:28:35,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1364213.3333333333, ans=0.2 2023-12-23 22:28:39,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1364213.3333333333, ans=0.125 2023-12-23 22:28:40,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2023-12-23 22:28:40,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1364213.3333333333, ans=0.125 2023-12-23 22:28:45,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-12-23 22:28:53,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1364280.0, ans=0.125 2023-12-23 22:28:53,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1364280.0, ans=0.125 2023-12-23 22:28:59,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-12-23 22:29:14,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1364413.3333333333, ans=0.125 2023-12-23 22:29:17,709 INFO [train.py:886] (3/4) Epoch 43, batch 4500, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4945334.30 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:29:36,250 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.279e+01 3.827e+01 3.976e+01 4.152e+01 9.618e+01, threshold=7.952e+01, percent-clipped=1.0 2023-12-23 22:29:36,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1364546.6666666667, ans=0.125 2023-12-23 22:29:48,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2023-12-23 22:29:49,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-12-23 22:29:56,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1364680.0, ans=0.125 2023-12-23 22:30:01,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2023-12-23 22:30:04,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-12-23 22:30:09,422 INFO [train.py:886] (3/4) Epoch 43, batch 4550, loss[loss=0.01031, audio_tagging_loss=0.01031, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4948734.28 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:30:11,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1364813.3333333333, ans=0.125 2023-12-23 22:30:27,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.42 vs. limit=22.5 2023-12-23 22:30:32,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1364946.6666666667, ans=0.0 2023-12-23 22:30:34,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1364946.6666666667, ans=0.0 2023-12-23 22:30:43,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2023-12-23 22:30:45,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1365013.3333333333, ans=0.125 2023-12-23 22:30:52,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1365080.0, ans=0.09899494936611666 2023-12-23 22:31:02,236 INFO [train.py:886] (3/4) Epoch 43, batch 4600, loss[loss=0.007805, audio_tagging_loss=0.007805, over 24026.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4949399.41 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:31:02,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1365146.6666666667, ans=0.0 2023-12-23 22:31:18,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-12-23 22:31:19,362 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.855e+01 4.026e+01 4.238e+01 4.840e+01, threshold=8.052e+01, percent-clipped=0.0 2023-12-23 22:31:32,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1365346.6666666667, ans=0.125 2023-12-23 22:31:41,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1365413.3333333333, ans=0.125 2023-12-23 22:31:42,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1365413.3333333333, ans=0.0 2023-12-23 22:31:51,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.42 vs. limit=22.5 2023-12-23 22:31:52,328 INFO [train.py:886] (3/4) Epoch 43, batch 4650, loss[loss=0.009894, audio_tagging_loss=0.009894, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4953542.65 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:32:09,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1365546.6666666667, ans=0.0 2023-12-23 22:32:34,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1365746.6666666667, ans=0.0 2023-12-23 22:32:43,784 INFO [train.py:886] (3/4) Epoch 43, batch 4700, loss[loss=0.01224, audio_tagging_loss=0.01224, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4955477.34 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:32:58,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1365880.0, ans=0.1 2023-12-23 22:32:59,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1365880.0, ans=0.0 2023-12-23 22:33:00,263 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.428e+01 3.819e+01 4.036e+01 4.197e+01 4.891e+01, threshold=8.073e+01, percent-clipped=0.0 2023-12-23 22:33:12,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1366013.3333333333, ans=0.07 2023-12-23 22:33:15,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1366013.3333333333, ans=0.0 2023-12-23 22:33:16,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1366013.3333333333, ans=0.0 2023-12-23 22:33:29,792 INFO [train.py:886] (3/4) Epoch 43, batch 4750, loss[loss=0.01245, audio_tagging_loss=0.01245, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4952114.87 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2023-12-23 22:33:29,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1366146.6666666667, ans=0.125 2023-12-23 22:33:39,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1366213.3333333333, ans=0.2 2023-12-23 22:33:41,606 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:34:06,058 INFO [train.py:886] (3/4) Epoch 44, batch 0, loss[loss=0.02612, audio_tagging_loss=0.02612, over 21892.00 frames. ], tot_loss[loss=0.02612, audio_tagging_loss=0.02612, over 21892.00 frames. ], batch size: 107, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:34:06,058 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 22:34:27,393 INFO [train.py:917] (3/4) Epoch 44, validation: loss=0.03574, audio_tagging_loss=0.03574, over 3737520.00 frames. 2023-12-23 22:34:27,393 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 22:34:33,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1366253.3333333333, ans=0.0 2023-12-23 22:34:42,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1366320.0, ans=0.125 2023-12-23 22:34:47,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2023-12-23 22:34:55,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1366386.6666666667, ans=0.125 2023-12-23 22:35:10,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2023-12-23 22:35:14,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1366520.0, ans=0.125 2023-12-23 22:35:17,581 INFO [train.py:886] (3/4) Epoch 44, batch 50, loss[loss=0.01311, audio_tagging_loss=0.01311, over 25000.00 frames. ], tot_loss[loss=0.01767, audio_tagging_loss=0.01767, over 1115789.43 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:35:17,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1366586.6666666667, ans=0.2 2023-12-23 22:35:20,386 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.065e+01 4.610e+01 5.536e+01 1.097e+02, threshold=9.221e+01, percent-clipped=8.0 2023-12-23 22:35:21,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1366586.6666666667, ans=0.0 2023-12-23 22:35:33,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=15.0 2023-12-23 22:35:38,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=15.0 2023-12-23 22:35:39,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1366720.0, ans=0.125 2023-12-23 22:35:47,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1366786.6666666667, ans=0.125 2023-12-23 22:35:52,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1366786.6666666667, ans=0.1 2023-12-23 22:35:53,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1366786.6666666667, ans=0.125 2023-12-23 22:35:54,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1366786.6666666667, ans=0.1 2023-12-23 22:35:58,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-12-23 22:36:06,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1366853.3333333333, ans=0.0 2023-12-23 22:36:08,384 INFO [train.py:886] (3/4) Epoch 44, batch 100, loss[loss=0.01343, audio_tagging_loss=0.01343, over 25000.00 frames. ], tot_loss[loss=0.0154, audio_tagging_loss=0.0154, over 1966960.44 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:36:15,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1366920.0, ans=0.09899494936611666 2023-12-23 22:36:21,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1366986.6666666667, ans=0.0 2023-12-23 22:36:22,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1366986.6666666667, ans=0.2 2023-12-23 22:36:36,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1367053.3333333333, ans=0.125 2023-12-23 22:36:40,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1367120.0, ans=0.125 2023-12-23 22:36:49,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1367186.6666666667, ans=0.0 2023-12-23 22:36:51,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:54,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1367186.6666666667, ans=0.125 2023-12-23 22:36:59,351 INFO [train.py:886] (3/4) Epoch 44, batch 150, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01401, audio_tagging_loss=0.01401, over 2632900.37 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:37:02,155 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.808e+01 4.071e+01 4.284e+01 4.499e+01 5.493e+01, threshold=8.567e+01, percent-clipped=0.0 2023-12-23 22:37:08,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1367320.0, ans=0.0 2023-12-23 22:37:10,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1367320.0, ans=0.125 2023-12-23 22:37:22,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1367386.6666666667, ans=0.0 2023-12-23 22:37:39,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2023-12-23 22:37:51,614 INFO [train.py:886] (3/4) Epoch 44, batch 200, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.0133, audio_tagging_loss=0.0133, over 3142903.49 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:37:52,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-12-23 22:38:00,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2023-12-23 22:38:07,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1367653.3333333333, ans=0.125 2023-12-23 22:38:21,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1367786.6666666667, ans=0.125 2023-12-23 22:38:31,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1367853.3333333333, ans=0.1 2023-12-23 22:38:42,577 INFO [train.py:886] (3/4) Epoch 44, batch 250, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01278, audio_tagging_loss=0.01278, over 3542546.98 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:38:45,353 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.886e+01 4.052e+01 4.204e+01 5.117e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-23 22:39:07,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1368053.3333333333, ans=0.125 2023-12-23 22:39:23,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1368186.6666666667, ans=0.125 2023-12-23 22:39:34,328 INFO [train.py:886] (3/4) Epoch 44, batch 300, loss[loss=0.01568, audio_tagging_loss=0.01568, over 24950.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 3854420.72 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:39:42,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1368253.3333333333, ans=0.1 2023-12-23 22:39:56,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2023-12-23 22:39:59,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1368386.6666666667, ans=0.125 2023-12-23 22:40:01,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1368386.6666666667, ans=0.025 2023-12-23 22:40:26,285 INFO [train.py:886] (3/4) Epoch 44, batch 350, loss[loss=0.009608, audio_tagging_loss=0.009608, over 24750.00 frames. ], tot_loss[loss=0.01228, audio_tagging_loss=0.01228, over 4087385.98 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 16.0 2023-12-23 22:40:29,093 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.809e+01 3.953e+01 4.148e+01 4.528e+01, threshold=7.906e+01, percent-clipped=0.0 2023-12-23 22:40:37,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1368653.3333333333, ans=0.125 2023-12-23 22:40:39,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1368653.3333333333, ans=0.125 2023-12-23 22:40:41,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-12-23 22:40:42,399 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:40:49,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1368720.0, ans=0.05 2023-12-23 22:40:50,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1368720.0, ans=0.125 2023-12-23 22:41:16,718 INFO [train.py:886] (3/4) Epoch 44, batch 400, loss[loss=0.01281, audio_tagging_loss=0.01281, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4279011.78 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:41:17,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1368920.0, ans=0.125 2023-12-23 22:41:28,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-12-23 22:41:28,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1368986.6666666667, ans=0.125 2023-12-23 22:41:42,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1369053.3333333333, ans=0.125 2023-12-23 22:41:49,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1369120.0, ans=0.125 2023-12-23 22:42:08,314 INFO [train.py:886] (3/4) Epoch 44, batch 450, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01168, audio_tagging_loss=0.01168, over 4427022.53 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:42:10,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1369253.3333333333, ans=0.125 2023-12-23 22:42:11,765 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.414e+01 3.763e+01 3.912e+01 4.071e+01 4.674e+01, threshold=7.824e+01, percent-clipped=0.0 2023-12-23 22:42:14,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1369253.3333333333, ans=0.125 2023-12-23 22:42:20,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1369320.0, ans=0.0 2023-12-23 22:42:20,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1369320.0, ans=0.2 2023-12-23 22:42:29,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2023-12-23 22:42:32,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-12-23 22:42:44,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=22.5 2023-12-23 22:42:48,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-12-23 22:42:59,943 INFO [train.py:886] (3/4) Epoch 44, batch 500, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01159, audio_tagging_loss=0.01159, over 4547662.58 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:27,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1369720.0, ans=0.125 2023-12-23 22:43:38,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-12-23 22:43:41,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1369853.3333333333, ans=0.1 2023-12-23 22:43:46,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1369853.3333333333, ans=0.0 2023-12-23 22:43:48,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1369853.3333333333, ans=0.05 2023-12-23 22:43:51,612 INFO [train.py:886] (3/4) Epoch 44, batch 550, loss[loss=0.01048, audio_tagging_loss=0.01048, over 25000.00 frames. ], tot_loss[loss=0.01154, audio_tagging_loss=0.01154, over 4640624.05 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:43:54,464 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.575e+01 3.800e+01 3.977e+01 4.148e+01 4.797e+01, threshold=7.954e+01, percent-clipped=0.0 2023-12-23 22:43:54,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-23 22:44:14,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1370053.3333333333, ans=0.125 2023-12-23 22:44:43,221 INFO [train.py:886] (3/4) Epoch 44, batch 600, loss[loss=0.01483, audio_tagging_loss=0.01483, over 24944.00 frames. ], tot_loss[loss=0.01158, audio_tagging_loss=0.01158, over 4706834.84 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:44:43,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2023-12-23 22:44:45,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1370253.3333333333, ans=0.125 2023-12-23 22:45:08,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1370386.6666666667, ans=0.07 2023-12-23 22:45:25,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1370520.0, ans=0.0 2023-12-23 22:45:27,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1370520.0, ans=0.125 2023-12-23 22:45:29,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-12-23 22:45:34,274 INFO [train.py:886] (3/4) Epoch 44, batch 650, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01152, audio_tagging_loss=0.01152, over 4756974.34 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:45:37,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 3.789e+01 3.956e+01 4.142e+01 5.204e+01, threshold=7.912e+01, percent-clipped=0.0 2023-12-23 22:45:43,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1370586.6666666667, ans=0.125 2023-12-23 22:45:46,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1370653.3333333333, ans=0.2 2023-12-23 22:45:52,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1370653.3333333333, ans=0.2 2023-12-23 22:46:01,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1370720.0, ans=0.125 2023-12-23 22:46:04,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1370720.0, ans=15.0 2023-12-23 22:46:26,825 INFO [train.py:886] (3/4) Epoch 44, batch 700, loss[loss=0.01076, audio_tagging_loss=0.01076, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4795356.23 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:46:51,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1371053.3333333333, ans=0.0 2023-12-23 22:47:02,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.05 vs. limit=12.0 2023-12-23 22:47:19,228 INFO [train.py:886] (3/4) Epoch 44, batch 750, loss[loss=0.01227, audio_tagging_loss=0.01227, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4833762.30 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:47:21,988 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.757e+01 3.902e+01 4.113e+01 5.017e+01, threshold=7.805e+01, percent-clipped=0.0 2023-12-23 22:47:22,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1371253.3333333333, ans=0.0 2023-12-23 22:47:25,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1371253.3333333333, ans=0.125 2023-12-23 22:47:58,586 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:48:04,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1371520.0, ans=0.1 2023-12-23 22:48:09,721 INFO [train.py:886] (3/4) Epoch 44, batch 800, loss[loss=0.009332, audio_tagging_loss=0.009332, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4858969.30 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:48:11,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-12-23 22:48:24,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1371653.3333333333, ans=0.125 2023-12-23 22:48:26,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1371653.3333333333, ans=0.125 2023-12-23 22:48:32,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1371720.0, ans=0.125 2023-12-23 22:48:37,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1371720.0, ans=0.2 2023-12-23 22:48:44,216 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:49:02,934 INFO [train.py:886] (3/4) Epoch 44, batch 850, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4882475.43 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:04,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=12.0 2023-12-23 22:49:05,729 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.773e+01 3.934e+01 4.147e+01 6.054e+01, threshold=7.868e+01, percent-clipped=0.0 2023-12-23 22:49:24,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.59 vs. limit=12.0 2023-12-23 22:49:32,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1372053.3333333333, ans=0.0 2023-12-23 22:49:51,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1372186.6666666667, ans=0.125 2023-12-23 22:49:53,205 INFO [train.py:886] (3/4) Epoch 44, batch 900, loss[loss=0.01229, audio_tagging_loss=0.01229, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4895334.19 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:49:58,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-12-23 22:50:25,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1372453.3333333333, ans=0.125 2023-12-23 22:50:30,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=22.5 2023-12-23 22:50:35,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1372520.0, ans=0.0 2023-12-23 22:50:45,612 INFO [train.py:886] (3/4) Epoch 44, batch 950, loss[loss=0.01488, audio_tagging_loss=0.01488, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4898002.01 frames. ], batch size: 99, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:50:48,461 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.389e+01 3.853e+01 3.991e+01 4.175e+01 5.097e+01, threshold=7.983e+01, percent-clipped=0.0 2023-12-23 22:51:08,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1372720.0, ans=0.0 2023-12-23 22:51:09,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1372720.0, ans=0.125 2023-12-23 22:51:13,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1372720.0, ans=0.125 2023-12-23 22:51:15,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1372786.6666666667, ans=0.1 2023-12-23 22:51:27,751 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:51:36,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1372853.3333333333, ans=0.125 2023-12-23 22:51:36,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1372853.3333333333, ans=0.125 2023-12-23 22:51:36,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-12-23 22:51:38,150 INFO [train.py:886] (3/4) Epoch 44, batch 1000, loss[loss=0.01193, audio_tagging_loss=0.01193, over 22205.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4902185.37 frames. ], batch size: 107, lr: 2.45e-03, grad_scale: 32.0 2023-12-23 22:51:50,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1372986.6666666667, ans=0.125 2023-12-23 22:52:01,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1373053.3333333333, ans=0.125 2023-12-23 22:52:09,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1373120.0, ans=0.015 2023-12-23 22:52:25,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-12-23 22:52:27,692 INFO [train.py:886] (3/4) Epoch 44, batch 1050, loss[loss=0.01166, audio_tagging_loss=0.01166, over 24750.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4915659.26 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:52:28,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-12-23 22:52:31,161 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.448e+01 3.809e+01 4.004e+01 4.174e+01 4.765e+01, threshold=8.009e+01, percent-clipped=0.0 2023-12-23 22:52:33,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1373253.3333333333, ans=0.125 2023-12-23 22:52:36,521 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 22:52:38,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1373253.3333333333, ans=0.09899494936611666 2023-12-23 22:52:42,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1373320.0, ans=0.0 2023-12-23 22:52:43,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.29 vs. limit=10.0 2023-12-23 22:52:49,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1373386.6666666667, ans=0.0 2023-12-23 22:53:01,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1373453.3333333333, ans=0.125 2023-12-23 22:53:21,035 INFO [train.py:886] (3/4) Epoch 44, batch 1100, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4926518.03 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:53:26,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1373586.6666666667, ans=0.05 2023-12-23 22:53:42,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2023-12-23 22:53:56,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1373786.6666666667, ans=0.125 2023-12-23 22:54:12,606 INFO [train.py:886] (3/4) Epoch 44, batch 1150, loss[loss=0.01366, audio_tagging_loss=0.01366, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4939928.94 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:54:16,268 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.752e+01 3.932e+01 4.115e+01 4.811e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:54:24,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1373986.6666666667, ans=0.2 2023-12-23 22:54:25,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1373986.6666666667, ans=0.1 2023-12-23 22:54:40,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1374053.3333333333, ans=0.0 2023-12-23 22:54:40,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1374053.3333333333, ans=0.125 2023-12-23 22:55:04,712 INFO [train.py:886] (3/4) Epoch 44, batch 1200, loss[loss=0.01373, audio_tagging_loss=0.01373, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4946352.60 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:55:05,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2023-12-23 22:55:05,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1374253.3333333333, ans=0.125 2023-12-23 22:55:09,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1374253.3333333333, ans=0.0 2023-12-23 22:55:35,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-12-23 22:55:55,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1374586.6666666667, ans=0.125 2023-12-23 22:55:56,739 INFO [train.py:886] (3/4) Epoch 44, batch 1250, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4941940.70 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:55:57,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1374586.6666666667, ans=0.05 2023-12-23 22:56:00,326 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.517e+01 3.822e+01 4.031e+01 4.210e+01 4.983e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-23 22:56:18,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1374720.0, ans=10.0 2023-12-23 22:56:33,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2023-12-23 22:56:47,067 INFO [train.py:886] (3/4) Epoch 44, batch 1300, loss[loss=0.01344, audio_tagging_loss=0.01344, over 24750.00 frames. ], tot_loss[loss=0.01137, audio_tagging_loss=0.01137, over 4941734.05 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:16,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1375053.3333333333, ans=0.1 2023-12-23 22:57:16,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1375120.0, ans=0.125 2023-12-23 22:57:17,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1375120.0, ans=0.07 2023-12-23 22:57:24,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1375120.0, ans=0.2 2023-12-23 22:57:39,988 INFO [train.py:886] (3/4) Epoch 44, batch 1350, loss[loss=0.01208, audio_tagging_loss=0.01208, over 25000.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4945048.75 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:57:42,818 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.826e+01 3.975e+01 4.146e+01 4.619e+01, threshold=7.951e+01, percent-clipped=0.0 2023-12-23 22:57:43,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1375253.3333333333, ans=0.0 2023-12-23 22:57:52,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1375320.0, ans=0.0 2023-12-23 22:57:55,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2023-12-23 22:58:32,853 INFO [train.py:886] (3/4) Epoch 44, batch 1400, loss[loss=0.01055, audio_tagging_loss=0.01055, over 24048.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4953699.66 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:58:36,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1375586.6666666667, ans=0.125 2023-12-23 22:58:39,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-12-23 22:58:52,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1375720.0, ans=0.125 2023-12-23 22:58:59,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1375720.0, ans=0.2 2023-12-23 22:59:00,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1375720.0, ans=0.125 2023-12-23 22:59:02,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1375720.0, ans=0.0 2023-12-23 22:59:23,961 INFO [train.py:886] (3/4) Epoch 44, batch 1450, loss[loss=0.01318, audio_tagging_loss=0.01318, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4957017.12 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 22:59:25,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1375920.0, ans=0.125 2023-12-23 22:59:26,774 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.506e+01 3.727e+01 3.932e+01 4.156e+01 5.029e+01, threshold=7.864e+01, percent-clipped=0.0 2023-12-23 22:59:31,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1375920.0, ans=0.125 2023-12-23 22:59:48,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1376053.3333333333, ans=0.125 2023-12-23 22:59:52,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1376053.3333333333, ans=0.09899494936611666 2023-12-23 23:00:04,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1376120.0, ans=0.125 2023-12-23 23:00:10,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1376186.6666666667, ans=0.0 2023-12-23 23:00:16,696 INFO [train.py:886] (3/4) Epoch 44, batch 1500, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4964000.39 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:00:19,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2023-12-23 23:00:24,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1376253.3333333333, ans=0.125 2023-12-23 23:00:37,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1376386.6666666667, ans=0.125 2023-12-23 23:00:38,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1376386.6666666667, ans=0.125 2023-12-23 23:00:40,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1376386.6666666667, ans=0.0 2023-12-23 23:00:45,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-12-23 23:00:46,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1376386.6666666667, ans=0.0 2023-12-23 23:00:48,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1376453.3333333333, ans=0.1 2023-12-23 23:00:51,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-12-23 23:00:57,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1376520.0, ans=0.2 2023-12-23 23:00:58,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-12-23 23:01:09,436 INFO [train.py:886] (3/4) Epoch 44, batch 1550, loss[loss=0.00909, audio_tagging_loss=0.00909, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4960239.93 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:01:13,027 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.562e+01 3.917e+01 4.065e+01 4.220e+01 5.107e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-23 23:01:16,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1376586.6666666667, ans=0.125 2023-12-23 23:01:19,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1376653.3333333333, ans=0.015 2023-12-23 23:01:32,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1376720.0, ans=0.125 2023-12-23 23:01:35,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1376720.0, ans=0.0 2023-12-23 23:01:40,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1376786.6666666667, ans=0.125 2023-12-23 23:01:55,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1376853.3333333333, ans=0.0 2023-12-23 23:02:00,083 INFO [train.py:886] (3/4) Epoch 44, batch 1600, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4950137.28 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:00,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1376920.0, ans=0.2 2023-12-23 23:02:20,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1377053.3333333333, ans=0.125 2023-12-23 23:02:22,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1377053.3333333333, ans=0.125 2023-12-23 23:02:25,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1377053.3333333333, ans=0.125 2023-12-23 23:02:26,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-12-23 23:02:34,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1377120.0, ans=0.1 2023-12-23 23:02:45,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1377186.6666666667, ans=0.025 2023-12-23 23:02:52,356 INFO [train.py:886] (3/4) Epoch 44, batch 1650, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4950986.71 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:02:55,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.228e+01 3.859e+01 4.003e+01 4.218e+01 7.648e+01, threshold=8.006e+01, percent-clipped=0.0 2023-12-23 23:03:04,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1377320.0, ans=0.05 2023-12-23 23:03:08,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1377320.0, ans=0.1 2023-12-23 23:03:14,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1377386.6666666667, ans=0.125 2023-12-23 23:03:22,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.97 vs. limit=10.0 2023-12-23 23:03:42,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1377586.6666666667, ans=0.0 2023-12-23 23:03:43,277 INFO [train.py:886] (3/4) Epoch 44, batch 1700, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4950413.62 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:03:45,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1377586.6666666667, ans=0.125 2023-12-23 23:03:55,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1377653.3333333333, ans=0.125 2023-12-23 23:04:00,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1377653.3333333333, ans=0.125 2023-12-23 23:04:07,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-12-23 23:04:17,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1377786.6666666667, ans=0.09899494936611666 2023-12-23 23:04:22,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1377786.6666666667, ans=10.0 2023-12-23 23:04:35,380 INFO [train.py:886] (3/4) Epoch 44, batch 1750, loss[loss=0.00976, audio_tagging_loss=0.00976, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4956293.08 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:04:38,157 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.774e+01 3.975e+01 4.116e+01 4.755e+01, threshold=7.949e+01, percent-clipped=0.0 2023-12-23 23:04:47,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1377986.6666666667, ans=0.125 2023-12-23 23:05:27,871 INFO [train.py:886] (3/4) Epoch 44, batch 1800, loss[loss=0.01196, audio_tagging_loss=0.01196, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4958765.73 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:05:42,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1378320.0, ans=0.1 2023-12-23 23:05:49,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=1378386.6666666667, ans=12.0 2023-12-23 23:05:56,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1378386.6666666667, ans=0.125 2023-12-23 23:06:00,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1378453.3333333333, ans=0.125 2023-12-23 23:06:04,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1378453.3333333333, ans=0.125 2023-12-23 23:06:04,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1378453.3333333333, ans=0.1 2023-12-23 23:06:19,002 INFO [train.py:886] (3/4) Epoch 44, batch 1850, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24952.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4958442.80 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:06:20,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1378586.6666666667, ans=0.125 2023-12-23 23:06:21,832 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.867e+01 4.025e+01 4.218e+01 4.619e+01, threshold=8.051e+01, percent-clipped=0.0 2023-12-23 23:06:23,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1378586.6666666667, ans=0.125 2023-12-23 23:06:45,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-12-23 23:06:50,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1378786.6666666667, ans=0.125 2023-12-23 23:07:04,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-12-23 23:07:10,739 INFO [train.py:886] (3/4) Epoch 44, batch 1900, loss[loss=0.013, audio_tagging_loss=0.013, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4949688.72 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:07:12,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1378920.0, ans=0.2 2023-12-23 23:07:28,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1378986.6666666667, ans=0.1 2023-12-23 23:07:31,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2023-12-23 23:07:50,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1379186.6666666667, ans=0.125 2023-12-23 23:07:51,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1379186.6666666667, ans=0.0 2023-12-23 23:07:51,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-12-23 23:08:02,078 INFO [train.py:886] (3/4) Epoch 44, batch 1950, loss[loss=0.01145, audio_tagging_loss=0.01145, over 24750.00 frames. ], tot_loss[loss=0.01124, audio_tagging_loss=0.01124, over 4943513.30 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:08:05,532 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.823e+01 4.045e+01 4.197e+01 4.926e+01, threshold=8.090e+01, percent-clipped=0.0 2023-12-23 23:08:08,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1379253.3333333333, ans=0.125 2023-12-23 23:08:25,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2023-12-23 23:08:25,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1379386.6666666667, ans=0.125 2023-12-23 23:08:27,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2023-12-23 23:08:44,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1379520.0, ans=0.125 2023-12-23 23:08:50,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1379520.0, ans=0.125 2023-12-23 23:08:53,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1379586.6666666667, ans=0.0 2023-12-23 23:08:54,352 INFO [train.py:886] (3/4) Epoch 44, batch 2000, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4947515.57 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 32.0 2023-12-23 23:08:57,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-23 23:09:06,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1379653.3333333333, ans=0.07 2023-12-23 23:09:38,556 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:09:42,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1379853.3333333333, ans=0.025 2023-12-23 23:09:46,488 INFO [train.py:886] (3/4) Epoch 44, batch 2050, loss[loss=0.01221, audio_tagging_loss=0.01221, over 25000.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4948269.77 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:09:49,334 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.403e+01 3.844e+01 3.984e+01 4.167e+01 5.134e+01, threshold=7.969e+01, percent-clipped=0.0 2023-12-23 23:10:12,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1380053.3333333333, ans=0.1 2023-12-23 23:10:12,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1380053.3333333333, ans=0.125 2023-12-23 23:10:13,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1380053.3333333333, ans=0.125 2023-12-23 23:10:22,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=15.0 2023-12-23 23:10:26,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1380186.6666666667, ans=0.125 2023-12-23 23:10:33,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1380186.6666666667, ans=0.125 2023-12-23 23:10:35,733 INFO [train.py:886] (3/4) Epoch 44, batch 2100, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4955274.62 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:10:44,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1380253.3333333333, ans=10.0 2023-12-23 23:10:54,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1380320.0, ans=0.125 2023-12-23 23:11:00,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1380386.6666666667, ans=0.2 2023-12-23 23:11:23,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1380520.0, ans=0.125 2023-12-23 23:11:28,053 INFO [train.py:886] (3/4) Epoch 44, batch 2150, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4960147.40 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:11:30,900 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.811e+01 3.971e+01 4.151e+01 4.761e+01, threshold=7.942e+01, percent-clipped=0.0 2023-12-23 23:11:55,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1380720.0, ans=0.2 2023-12-23 23:11:58,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1380786.6666666667, ans=0.0 2023-12-23 23:12:00,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1380786.6666666667, ans=0.0 2023-12-23 23:12:09,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1380853.3333333333, ans=0.125 2023-12-23 23:12:18,073 INFO [train.py:886] (3/4) Epoch 44, batch 2200, loss[loss=0.01248, audio_tagging_loss=0.01248, over 24750.00 frames. ], tot_loss[loss=0.01133, audio_tagging_loss=0.01133, over 4958050.95 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:12:18,372 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:12:19,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1380920.0, ans=0.0 2023-12-23 23:12:22,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1380920.0, ans=0.125 2023-12-23 23:12:24,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-12-23 23:12:40,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1381053.3333333333, ans=0.0 2023-12-23 23:12:52,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1381120.0, ans=0.125 2023-12-23 23:13:01,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1381186.6666666667, ans=0.0 2023-12-23 23:13:09,595 INFO [train.py:886] (3/4) Epoch 44, batch 2250, loss[loss=0.01144, audio_tagging_loss=0.01144, over 24750.00 frames. ], tot_loss[loss=0.01136, audio_tagging_loss=0.01136, over 4956371.74 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:13:09,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1381253.3333333333, ans=0.1 2023-12-23 23:13:12,389 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.439e+01 3.814e+01 4.052e+01 4.266e+01 4.733e+01, threshold=8.103e+01, percent-clipped=0.0 2023-12-23 23:13:12,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1381253.3333333333, ans=0.1 2023-12-23 23:13:27,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1381320.0, ans=0.1 2023-12-23 23:13:37,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1381386.6666666667, ans=0.0 2023-12-23 23:13:41,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1381453.3333333333, ans=0.125 2023-12-23 23:13:45,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1381453.3333333333, ans=0.125 2023-12-23 23:13:54,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1381520.0, ans=0.1 2023-12-23 23:13:56,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1381520.0, ans=0.0 2023-12-23 23:14:02,104 INFO [train.py:886] (3/4) Epoch 44, batch 2300, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01129, audio_tagging_loss=0.01129, over 4951711.44 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:06,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1381586.6666666667, ans=10.0 2023-12-23 23:14:08,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1381586.6666666667, ans=0.0 2023-12-23 23:14:13,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1381653.3333333333, ans=0.125 2023-12-23 23:14:17,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1381653.3333333333, ans=0.0 2023-12-23 23:14:35,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.98 vs. limit=22.5 2023-12-23 23:14:40,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1381786.6666666667, ans=0.125 2023-12-23 23:14:53,647 INFO [train.py:886] (3/4) Epoch 44, batch 2350, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4951983.75 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:14:57,190 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.396e+01 3.788e+01 3.954e+01 4.114e+01 5.032e+01, threshold=7.908e+01, percent-clipped=0.0 2023-12-23 23:15:08,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1381986.6666666667, ans=0.125 2023-12-23 23:15:11,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1381986.6666666667, ans=0.125 2023-12-23 23:15:14,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1382053.3333333333, ans=15.0 2023-12-23 23:15:17,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2023-12-23 23:15:18,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1382053.3333333333, ans=0.0 2023-12-23 23:15:21,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1382053.3333333333, ans=0.125 2023-12-23 23:15:31,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1382120.0, ans=0.2 2023-12-23 23:15:41,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1382186.6666666667, ans=0.125 2023-12-23 23:15:46,384 INFO [train.py:886] (3/4) Epoch 44, batch 2400, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4955004.81 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:15:49,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1382253.3333333333, ans=0.1 2023-12-23 23:15:53,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1382253.3333333333, ans=0.125 2023-12-23 23:15:54,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-12-23 23:16:00,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1382320.0, ans=0.125 2023-12-23 23:16:07,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-12-23 23:16:24,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-12-23 23:16:25,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-12-23 23:16:38,127 INFO [train.py:886] (3/4) Epoch 44, batch 2450, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4958625.84 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:16:41,714 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.351e+01 3.765e+01 3.945e+01 4.099e+01 8.510e+01, threshold=7.890e+01, percent-clipped=1.0 2023-12-23 23:16:45,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1382586.6666666667, ans=0.0 2023-12-23 23:16:57,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1382653.3333333333, ans=0.2 2023-12-23 23:17:02,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1382720.0, ans=0.125 2023-12-23 23:17:05,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1382720.0, ans=0.2 2023-12-23 23:17:18,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1382786.6666666667, ans=0.0 2023-12-23 23:17:30,031 INFO [train.py:886] (3/4) Epoch 44, batch 2500, loss[loss=0.01183, audio_tagging_loss=0.01183, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4954439.88 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:17:32,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.69 vs. limit=22.5 2023-12-23 23:18:19,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1383186.6666666667, ans=0.0 2023-12-23 23:18:22,366 INFO [train.py:886] (3/4) Epoch 44, batch 2550, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4951413.08 frames. ], batch size: 99, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:18:25,167 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.915e+01 4.071e+01 4.211e+01 5.058e+01, threshold=8.142e+01, percent-clipped=0.0 2023-12-23 23:18:28,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1383253.3333333333, ans=0.125 2023-12-23 23:18:32,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-12-23 23:19:08,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-12-23 23:19:10,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1383520.0, ans=0.125 2023-12-23 23:19:15,026 INFO [train.py:886] (3/4) Epoch 44, batch 2600, loss[loss=0.01188, audio_tagging_loss=0.01188, over 25000.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4951241.24 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:19:23,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1383586.6666666667, ans=0.0 2023-12-23 23:19:25,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1383653.3333333333, ans=0.125 2023-12-23 23:19:26,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2023-12-23 23:19:28,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1383653.3333333333, ans=0.0 2023-12-23 23:19:49,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383786.6666666667, ans=0.1 2023-12-23 23:19:55,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1383853.3333333333, ans=0.125 2023-12-23 23:20:01,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1383853.3333333333, ans=0.1 2023-12-23 23:20:01,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1383853.3333333333, ans=0.1 2023-12-23 23:20:03,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1383853.3333333333, ans=0.0 2023-12-23 23:20:05,556 INFO [train.py:886] (3/4) Epoch 44, batch 2650, loss[loss=0.0106, audio_tagging_loss=0.0106, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4952937.25 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:20:09,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.375e+01 3.827e+01 4.015e+01 4.224e+01 5.047e+01, threshold=8.029e+01, percent-clipped=0.0 2023-12-23 23:20:15,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1383986.6666666667, ans=0.1 2023-12-23 23:20:39,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.03 vs. limit=12.0 2023-12-23 23:20:42,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-12-23 23:20:59,297 INFO [train.py:886] (3/4) Epoch 44, batch 2700, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4955118.11 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 64.0 2023-12-23 23:21:02,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1384253.3333333333, ans=0.0 2023-12-23 23:21:11,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1384320.0, ans=0.125 2023-12-23 23:21:15,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1384320.0, ans=0.125 2023-12-23 23:21:27,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-23 23:21:43,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1384520.0, ans=0.125 2023-12-23 23:21:44,332 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:21:50,601 INFO [train.py:886] (3/4) Epoch 44, batch 2750, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4948591.34 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:21:53,386 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.502e+01 3.801e+01 3.957e+01 4.121e+01 4.471e+01, threshold=7.914e+01, percent-clipped=0.0 2023-12-23 23:21:59,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-12-23 23:22:01,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1384653.3333333333, ans=0.125 2023-12-23 23:22:02,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1384653.3333333333, ans=0.125 2023-12-23 23:22:20,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1384786.6666666667, ans=0.2 2023-12-23 23:22:20,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-12-23 23:22:29,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-12-23 23:22:42,922 INFO [train.py:886] (3/4) Epoch 44, batch 2800, loss[loss=0.009305, audio_tagging_loss=0.009305, over 24048.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4947369.99 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:22:44,087 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:22:50,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1384920.0, ans=0.0 2023-12-23 23:22:51,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384986.6666666667, ans=0.1 2023-12-23 23:22:59,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-12-23 23:23:03,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1385053.3333333333, ans=0.1 2023-12-23 23:23:12,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1385053.3333333333, ans=0.0 2023-12-23 23:23:35,048 INFO [train.py:886] (3/4) Epoch 44, batch 2850, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.01115, audio_tagging_loss=0.01115, over 4943655.13 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 64.0 2023-12-23 23:23:36,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1385253.3333333333, ans=0.1 2023-12-23 23:23:37,927 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.856e+01 3.997e+01 4.163e+01 4.618e+01, threshold=7.995e+01, percent-clipped=0.0 2023-12-23 23:23:45,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1385320.0, ans=0.0 2023-12-23 23:23:54,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1385386.6666666667, ans=0.07 2023-12-23 23:23:55,918 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:24:06,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1385453.3333333333, ans=0.1 2023-12-23 23:24:23,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1385520.0, ans=0.125 2023-12-23 23:24:26,133 INFO [train.py:886] (3/4) Epoch 44, batch 2900, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4939378.22 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:24:46,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1385653.3333333333, ans=0.0 2023-12-23 23:24:48,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1385720.0, ans=0.1 2023-12-23 23:24:53,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1385720.0, ans=0.0 2023-12-23 23:25:18,704 INFO [train.py:886] (3/4) Epoch 44, batch 2950, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4943182.44 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:25:22,473 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.797e+01 3.995e+01 4.163e+01 7.263e+01, threshold=7.990e+01, percent-clipped=0.0 2023-12-23 23:25:22,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1385920.0, ans=0.125 2023-12-23 23:25:24,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1385920.0, ans=0.125 2023-12-23 23:25:43,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1386053.3333333333, ans=0.2 2023-12-23 23:25:48,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1386053.3333333333, ans=0.125 2023-12-23 23:26:10,141 INFO [train.py:886] (3/4) Epoch 44, batch 3000, loss[loss=0.008892, audio_tagging_loss=0.008892, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4948810.77 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:26:10,141 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 23:26:32,312 INFO [train.py:917] (3/4) Epoch 44, validation: loss=0.03602, audio_tagging_loss=0.03602, over 3737520.00 frames. 2023-12-23 23:26:32,313 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 23:26:48,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1386320.0, ans=0.125 2023-12-23 23:26:48,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1386320.0, ans=0.125 2023-12-23 23:26:56,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1386386.6666666667, ans=0.1 2023-12-23 23:26:58,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1386386.6666666667, ans=0.0 2023-12-23 23:27:20,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1386520.0, ans=0.125 2023-12-23 23:27:23,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1386586.6666666667, ans=0.0 2023-12-23 23:27:24,344 INFO [train.py:886] (3/4) Epoch 44, batch 3050, loss[loss=0.01247, audio_tagging_loss=0.01247, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4954507.27 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:27:28,171 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.888e+01 4.021e+01 4.196e+01 4.723e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-23 23:27:30,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2023-12-23 23:27:55,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1386720.0, ans=0.025 2023-12-23 23:28:15,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2023-12-23 23:28:16,906 INFO [train.py:886] (3/4) Epoch 44, batch 3100, loss[loss=0.01148, audio_tagging_loss=0.01148, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4955427.10 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:28:34,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-12-23 23:28:46,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1387053.3333333333, ans=0.125 2023-12-23 23:29:00,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1387186.6666666667, ans=0.125 2023-12-23 23:29:08,877 INFO [train.py:886] (3/4) Epoch 44, batch 3150, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4953122.85 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:29:12,625 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.896e+01 4.085e+01 4.176e+01 5.009e+01, threshold=8.169e+01, percent-clipped=0.0 2023-12-23 23:29:19,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1387320.0, ans=0.125 2023-12-23 23:29:32,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2023-12-23 23:29:36,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1387386.6666666667, ans=0.125 2023-12-23 23:29:48,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1387453.3333333333, ans=0.0 2023-12-23 23:30:01,873 INFO [train.py:886] (3/4) Epoch 44, batch 3200, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4952217.07 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:10,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1387653.3333333333, ans=0.035 2023-12-23 23:30:19,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1387653.3333333333, ans=0.125 2023-12-23 23:30:36,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1387786.6666666667, ans=0.0 2023-12-23 23:30:49,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1387853.3333333333, ans=0.125 2023-12-23 23:30:53,190 INFO [train.py:886] (3/4) Epoch 44, batch 3250, loss[loss=0.01296, audio_tagging_loss=0.01296, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4950194.03 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:30:57,015 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.426e+01 3.832e+01 3.942e+01 4.111e+01 4.749e+01, threshold=7.885e+01, percent-clipped=0.0 2023-12-23 23:31:06,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2023-12-23 23:31:31,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1388120.0, ans=0.125 2023-12-23 23:31:35,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2023-12-23 23:31:44,691 INFO [train.py:886] (3/4) Epoch 44, batch 3300, loss[loss=0.01114, audio_tagging_loss=0.01114, over 24916.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4950562.63 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:31:56,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-12-23 23:32:13,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1388386.6666666667, ans=0.2 2023-12-23 23:32:14,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1388453.3333333333, ans=0.035 2023-12-23 23:32:22,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1388453.3333333333, ans=0.125 2023-12-23 23:32:29,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1388520.0, ans=0.125 2023-12-23 23:32:30,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1388520.0, ans=0.125 2023-12-23 23:32:35,940 INFO [train.py:886] (3/4) Epoch 44, batch 3350, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4951949.75 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:32:39,734 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.350e+01 3.780e+01 3.970e+01 4.173e+01 4.809e+01, threshold=7.941e+01, percent-clipped=0.0 2023-12-23 23:33:16,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1388786.6666666667, ans=0.0 2023-12-23 23:33:25,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1388853.3333333333, ans=0.1 2023-12-23 23:33:28,105 INFO [train.py:886] (3/4) Epoch 44, batch 3400, loss[loss=0.01527, audio_tagging_loss=0.01527, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4949469.01 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:33:34,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1388920.0, ans=0.125 2023-12-23 23:33:40,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1388986.6666666667, ans=0.1 2023-12-23 23:33:44,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1388986.6666666667, ans=10.0 2023-12-23 23:33:52,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1389053.3333333333, ans=0.1 2023-12-23 23:33:53,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1389053.3333333333, ans=0.0 2023-12-23 23:33:55,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2023-12-23 23:34:17,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1389186.6666666667, ans=0.125 2023-12-23 23:34:20,506 INFO [train.py:886] (3/4) Epoch 44, batch 3450, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4946416.93 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:34:24,937 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.949e+01 4.077e+01 4.251e+01 4.756e+01, threshold=8.154e+01, percent-clipped=0.0 2023-12-23 23:34:27,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2023-12-23 23:34:38,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1389320.0, ans=0.025 2023-12-23 23:34:50,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1389386.6666666667, ans=0.125 2023-12-23 23:34:54,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1389453.3333333333, ans=10.0 2023-12-23 23:35:03,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1389520.0, ans=0.0 2023-12-23 23:35:04,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-12-23 23:35:12,400 INFO [train.py:886] (3/4) Epoch 44, batch 3500, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4942418.07 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:35:47,312 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:35:47,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.40 vs. limit=6.0 2023-12-23 23:35:55,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1389853.3333333333, ans=0.125 2023-12-23 23:36:04,655 INFO [train.py:886] (3/4) Epoch 44, batch 3550, loss[loss=0.008729, audio_tagging_loss=0.008729, over 22205.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4941472.91 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:36:08,435 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.423e+01 3.841e+01 4.039e+01 4.182e+01 4.677e+01, threshold=8.078e+01, percent-clipped=0.0 2023-12-23 23:36:39,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1390120.0, ans=0.1 2023-12-23 23:36:40,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2023-12-23 23:36:51,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1390186.6666666667, ans=0.125 2023-12-23 23:36:55,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1390253.3333333333, ans=0.125 2023-12-23 23:36:56,420 INFO [train.py:886] (3/4) Epoch 44, batch 3600, loss[loss=0.01357, audio_tagging_loss=0.01357, over 25000.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4939457.83 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:36:58,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1390253.3333333333, ans=0.2 2023-12-23 23:37:04,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1390253.3333333333, ans=0.2 2023-12-23 23:37:08,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1390320.0, ans=0.125 2023-12-23 23:37:35,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1390453.3333333333, ans=0.125 2023-12-23 23:37:36,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2023-12-23 23:37:37,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1390520.0, ans=0.125 2023-12-23 23:37:48,516 INFO [train.py:886] (3/4) Epoch 44, batch 3650, loss[loss=0.009899, audio_tagging_loss=0.009899, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4949180.24 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:37:51,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1390586.6666666667, ans=0.125 2023-12-23 23:37:52,936 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.491e+01 3.828e+01 3.967e+01 4.135e+01 4.611e+01, threshold=7.934e+01, percent-clipped=0.0 2023-12-23 23:37:55,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-12-23 23:38:10,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-23 23:38:16,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1390720.0, ans=0.125 2023-12-23 23:38:28,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1390786.6666666667, ans=0.125 2023-12-23 23:38:33,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1390853.3333333333, ans=0.125 2023-12-23 23:38:41,037 INFO [train.py:886] (3/4) Epoch 44, batch 3700, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4954710.53 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:02,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1391053.3333333333, ans=0.0 2023-12-23 23:39:04,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=12.0 2023-12-23 23:39:08,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-12-23 23:39:12,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1391120.0, ans=0.0 2023-12-23 23:39:22,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1391186.6666666667, ans=0.125 2023-12-23 23:39:32,658 INFO [train.py:886] (3/4) Epoch 44, batch 3750, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4950773.11 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:39:37,094 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 3.871e+01 4.041e+01 4.218e+01 6.039e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-23 23:39:54,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1391386.6666666667, ans=0.125 2023-12-23 23:39:57,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1391386.6666666667, ans=0.0 2023-12-23 23:40:08,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=12.0 2023-12-23 23:40:12,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2023-12-23 23:40:20,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1391520.0, ans=0.125 2023-12-23 23:40:24,075 INFO [train.py:886] (3/4) Epoch 44, batch 3800, loss[loss=0.009729, audio_tagging_loss=0.009729, over 24750.00 frames. ], tot_loss[loss=0.01111, audio_tagging_loss=0.01111, over 4946601.06 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:40:26,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1391586.6666666667, ans=0.0 2023-12-23 23:40:38,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-12-23 23:40:40,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1391653.3333333333, ans=15.0 2023-12-23 23:40:43,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=15.0 2023-12-23 23:40:45,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1391720.0, ans=0.2 2023-12-23 23:41:04,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1391786.6666666667, ans=0.125 2023-12-23 23:41:17,268 INFO [train.py:886] (3/4) Epoch 44, batch 3850, loss[loss=0.008765, audio_tagging_loss=0.008765, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4943093.39 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:41:18,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1391920.0, ans=0.125 2023-12-23 23:41:21,126 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.492e+01 3.872e+01 4.027e+01 4.207e+01 4.777e+01, threshold=8.053e+01, percent-clipped=0.0 2023-12-23 23:41:35,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1391986.6666666667, ans=0.05 2023-12-23 23:41:50,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1392120.0, ans=0.125 2023-12-23 23:42:01,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1392186.6666666667, ans=0.0 2023-12-23 23:42:06,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1392186.6666666667, ans=0.1 2023-12-23 23:42:09,404 INFO [train.py:886] (3/4) Epoch 44, batch 3900, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4950707.48 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:42:16,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1392253.3333333333, ans=15.0 2023-12-23 23:42:50,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1392520.0, ans=0.125 2023-12-23 23:43:01,303 INFO [train.py:886] (3/4) Epoch 44, batch 3950, loss[loss=0.009873, audio_tagging_loss=0.009873, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4952731.38 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:43:01,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-12-23 23:43:04,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1392586.6666666667, ans=0.2 2023-12-23 23:43:05,141 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 3.857e+01 4.007e+01 4.216e+01 4.773e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-23 23:43:18,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1392653.3333333333, ans=0.1 2023-12-23 23:43:38,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1392786.6666666667, ans=0.125 2023-12-23 23:43:41,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-12-23 23:43:46,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-23 23:43:47,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1392853.3333333333, ans=0.125 2023-12-23 23:43:53,639 INFO [train.py:886] (3/4) Epoch 44, batch 4000, loss[loss=0.009757, audio_tagging_loss=0.009757, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4957157.73 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:44:05,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1392986.6666666667, ans=0.125 2023-12-23 23:44:09,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1392986.6666666667, ans=0.025 2023-12-23 23:44:12,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1393053.3333333333, ans=0.125 2023-12-23 23:44:18,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1393053.3333333333, ans=0.2 2023-12-23 23:44:44,866 INFO [train.py:886] (3/4) Epoch 44, batch 4050, loss[loss=0.01262, audio_tagging_loss=0.01262, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4961235.93 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:44:48,648 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 3.797e+01 4.007e+01 4.202e+01 4.983e+01, threshold=8.013e+01, percent-clipped=0.0 2023-12-23 23:44:58,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2023-12-23 23:45:01,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-12-23 23:45:03,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1393320.0, ans=0.0 2023-12-23 23:45:05,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1393386.6666666667, ans=0.2 2023-12-23 23:45:18,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-12-23 23:45:37,345 INFO [train.py:886] (3/4) Epoch 44, batch 4100, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24022.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4951425.12 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:45:43,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1393586.6666666667, ans=0.125 2023-12-23 23:45:46,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1393653.3333333333, ans=0.125 2023-12-23 23:45:54,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1393653.3333333333, ans=0.05 2023-12-23 23:46:17,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1393853.3333333333, ans=0.07 2023-12-23 23:46:21,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1393853.3333333333, ans=0.125 2023-12-23 23:46:22,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1393853.3333333333, ans=15.0 2023-12-23 23:46:27,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1393920.0, ans=0.125 2023-12-23 23:46:29,086 INFO [train.py:886] (3/4) Epoch 44, batch 4150, loss[loss=0.01103, audio_tagging_loss=0.01103, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4947689.08 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:46:33,522 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.380e+01 3.908e+01 4.057e+01 4.233e+01 4.763e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-23 23:46:34,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1393920.0, ans=10.0 2023-12-23 23:46:34,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-12-23 23:46:39,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1393986.6666666667, ans=0.125 2023-12-23 23:46:41,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1393986.6666666667, ans=0.0 2023-12-23 23:46:41,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2023-12-23 23:47:19,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1394186.6666666667, ans=0.0 2023-12-23 23:47:20,978 INFO [train.py:886] (3/4) Epoch 44, batch 4200, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4951787.21 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:47:23,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=12.0 2023-12-23 23:47:25,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1394253.3333333333, ans=0.125 2023-12-23 23:47:35,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1394320.0, ans=0.125 2023-12-23 23:47:42,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1394386.6666666667, ans=0.09899494936611666 2023-12-23 23:47:42,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1394386.6666666667, ans=15.0 2023-12-23 23:47:43,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1394386.6666666667, ans=0.0 2023-12-23 23:47:54,130 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:47:58,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1394453.3333333333, ans=0.125 2023-12-23 23:48:13,012 INFO [train.py:886] (3/4) Epoch 44, batch 4250, loss[loss=0.01056, audio_tagging_loss=0.01056, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4951769.67 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:48:17,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.360e+01 3.794e+01 3.945e+01 4.179e+01 4.749e+01, threshold=7.890e+01, percent-clipped=0.0 2023-12-23 23:48:21,620 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:48:23,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-12-23 23:48:24,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1394653.3333333333, ans=0.0 2023-12-23 23:48:48,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1394786.6666666667, ans=0.05 2023-12-23 23:49:04,133 INFO [train.py:886] (3/4) Epoch 44, batch 4300, loss[loss=0.01076, audio_tagging_loss=0.01076, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4950825.09 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:49:18,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2023-12-23 23:49:23,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2023-12-23 23:49:37,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1395120.0, ans=0.2 2023-12-23 23:49:57,521 INFO [train.py:886] (3/4) Epoch 44, batch 4350, loss[loss=0.01043, audio_tagging_loss=0.01043, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4951159.73 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:49:57,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1395253.3333333333, ans=0.125 2023-12-23 23:50:01,300 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.531e+01 3.887e+01 4.029e+01 4.199e+01 5.257e+01, threshold=8.057e+01, percent-clipped=0.0 2023-12-23 23:50:41,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=22.5 2023-12-23 23:50:44,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1395520.0, ans=0.125 2023-12-23 23:50:49,091 INFO [train.py:886] (3/4) Epoch 44, batch 4400, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4947166.86 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2023-12-23 23:50:51,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1395586.6666666667, ans=0.125 2023-12-23 23:50:57,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1395586.6666666667, ans=0.0 2023-12-23 23:50:59,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-12-23 23:51:12,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1395720.0, ans=0.0 2023-12-23 23:51:14,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1395720.0, ans=0.125 2023-12-23 23:51:19,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1395786.6666666667, ans=0.125 2023-12-23 23:51:22,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.76 vs. limit=22.5 2023-12-23 23:51:36,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1395853.3333333333, ans=0.0 2023-12-23 23:51:40,098 INFO [train.py:886] (3/4) Epoch 44, batch 4450, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4940564.35 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:51:43,922 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.489e+01 3.885e+01 4.023e+01 4.248e+01 5.191e+01, threshold=8.046e+01, percent-clipped=0.0 2023-12-23 23:51:48,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2023-12-23 23:51:53,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1395986.6666666667, ans=0.125 2023-12-23 23:51:53,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1395986.6666666667, ans=0.125 2023-12-23 23:51:59,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-12-23 23:52:22,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.92 vs. limit=10.0 2023-12-23 23:52:33,703 INFO [train.py:886] (3/4) Epoch 44, batch 4500, loss[loss=0.01112, audio_tagging_loss=0.01112, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4945424.14 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:52:51,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1396320.0, ans=0.125 2023-12-23 23:52:52,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2023-12-23 23:52:57,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1396386.6666666667, ans=0.0 2023-12-23 23:53:10,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2023-12-23 23:53:15,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1396520.0, ans=0.125 2023-12-23 23:53:19,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-12-23 23:53:24,791 INFO [train.py:886] (3/4) Epoch 44, batch 4550, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4944348.53 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:53:27,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1396586.6666666667, ans=0.0 2023-12-23 23:53:28,509 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.833e+01 3.993e+01 4.205e+01 5.726e+01, threshold=7.986e+01, percent-clipped=0.0 2023-12-23 23:53:34,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396653.3333333333, ans=0.1 2023-12-23 23:53:46,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1396720.0, ans=0.0 2023-12-23 23:53:53,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1396720.0, ans=0.125 2023-12-23 23:53:58,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1396786.6666666667, ans=0.0 2023-12-23 23:54:17,174 INFO [train.py:886] (3/4) Epoch 44, batch 4600, loss[loss=0.009555, audio_tagging_loss=0.009555, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4942451.38 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:54:41,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1397053.3333333333, ans=0.04949747468305833 2023-12-23 23:54:41,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-12-23 23:54:52,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-12-23 23:54:59,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1397186.6666666667, ans=0.125 2023-12-23 23:55:00,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2023-12-23 23:55:08,879 INFO [train.py:886] (3/4) Epoch 44, batch 4650, loss[loss=0.01025, audio_tagging_loss=0.01025, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4950253.06 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:55:13,397 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.838e+01 4.030e+01 4.199e+01 4.777e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-23 23:55:30,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1397386.6666666667, ans=0.0 2023-12-23 23:55:31,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1397386.6666666667, ans=0.125 2023-12-23 23:55:44,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1397453.3333333333, ans=0.125 2023-12-23 23:55:52,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1397520.0, ans=0.09899494936611666 2023-12-23 23:56:00,026 INFO [train.py:886] (3/4) Epoch 44, batch 4700, loss[loss=0.009688, audio_tagging_loss=0.009688, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4945697.86 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:08,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1397653.3333333333, ans=0.125 2023-12-23 23:56:09,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-12-23 23:56:13,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1397653.3333333333, ans=0.0 2023-12-23 23:56:27,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1397786.6666666667, ans=0.125 2023-12-23 23:56:38,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1397853.3333333333, ans=0.04949747468305833 2023-12-23 23:56:44,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1397853.3333333333, ans=0.125 2023-12-23 23:56:44,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1397853.3333333333, ans=0.0 2023-12-23 23:56:46,657 INFO [train.py:886] (3/4) Epoch 44, batch 4750, loss[loss=0.01253, audio_tagging_loss=0.01253, over 24750.00 frames. ], tot_loss[loss=0.01123, audio_tagging_loss=0.01123, over 4947846.44 frames. ], batch size: 99, lr: 2.42e-03, grad_scale: 32.0 2023-12-23 23:56:46,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1397920.0, ans=0.0 2023-12-23 23:56:50,271 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.598e+01 3.838e+01 4.057e+01 4.238e+01 5.270e+01, threshold=8.115e+01, percent-clipped=0.0 2023-12-23 23:56:58,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2023-12-23 23:57:22,197 INFO [train.py:886] (3/4) Epoch 45, batch 0, loss[loss=0.02676, audio_tagging_loss=0.02676, over 25000.00 frames. ], tot_loss[loss=0.02676, audio_tagging_loss=0.02676, over 25000.00 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 32.0 2023-12-23 23:57:22,197 INFO [train.py:909] (3/4) Computing validation loss 2023-12-23 23:57:43,157 INFO [train.py:917] (3/4) Epoch 45, validation: loss=0.03554, audio_tagging_loss=0.03554, over 3737520.00 frames. 2023-12-23 23:57:43,158 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-23 23:57:50,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1398026.6666666667, ans=0.125 2023-12-23 23:57:50,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=12.0 2023-12-23 23:58:08,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1398160.0, ans=0.125 2023-12-23 23:58:13,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1398226.6666666667, ans=0.1 2023-12-23 23:58:16,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1398226.6666666667, ans=0.1 2023-12-23 23:58:28,165 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-23 23:58:33,616 INFO [train.py:886] (3/4) Epoch 45, batch 50, loss[loss=0.01763, audio_tagging_loss=0.01763, over 25000.00 frames. ], tot_loss[loss=0.01771, audio_tagging_loss=0.01771, over 1113375.24 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:58:41,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1398360.0, ans=0.2 2023-12-23 23:58:50,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1398426.6666666667, ans=0.125 2023-12-23 23:58:59,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-12-23 23:59:03,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1398493.3333333333, ans=0.2 2023-12-23 23:59:10,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1398560.0, ans=0.125 2023-12-23 23:59:14,330 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.411e+01 4.844e+01 5.631e+01 1.112e+02, threshold=9.688e+01, percent-clipped=7.0 2023-12-23 23:59:21,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-12-23 23:59:24,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2023-12-23 23:59:26,359 INFO [train.py:886] (3/4) Epoch 45, batch 100, loss[loss=0.01193, audio_tagging_loss=0.01193, over 25000.00 frames. ], tot_loss[loss=0.01528, audio_tagging_loss=0.01528, over 1972186.78 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2023-12-23 23:59:31,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-12-23 23:59:40,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1398760.0, ans=0.125 2023-12-23 23:59:48,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1398826.6666666667, ans=0.125 2023-12-24 00:00:09,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1398960.0, ans=0.0 2023-12-24 00:00:18,181 INFO [train.py:886] (3/4) Epoch 45, batch 150, loss[loss=0.009306, audio_tagging_loss=0.009306, over 25000.00 frames. ], tot_loss[loss=0.01402, audio_tagging_loss=0.01402, over 2624720.30 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:00:21,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1399026.6666666667, ans=0.2 2023-12-24 00:00:57,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1399226.6666666667, ans=0.125 2023-12-24 00:00:59,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 3.963e+01 4.110e+01 4.348e+01 5.500e+01, threshold=8.220e+01, percent-clipped=0.0 2023-12-24 00:01:04,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1399293.3333333333, ans=0.05 2023-12-24 00:01:05,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1399293.3333333333, ans=0.125 2023-12-24 00:01:09,651 INFO [train.py:886] (3/4) Epoch 45, batch 200, loss[loss=0.01184, audio_tagging_loss=0.01184, over 25000.00 frames. ], tot_loss[loss=0.01322, audio_tagging_loss=0.01322, over 3144939.46 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:01:14,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1399360.0, ans=0.1 2023-12-24 00:01:35,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1399493.3333333333, ans=0.125 2023-12-24 00:01:35,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1399493.3333333333, ans=0.125 2023-12-24 00:01:39,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1399493.3333333333, ans=0.125 2023-12-24 00:01:44,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1399560.0, ans=0.0 2023-12-24 00:01:54,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1399626.6666666667, ans=0.0 2023-12-24 00:02:02,097 INFO [train.py:886] (3/4) Epoch 45, batch 250, loss[loss=0.01097, audio_tagging_loss=0.01097, over 24029.00 frames. ], tot_loss[loss=0.01266, audio_tagging_loss=0.01266, over 3549831.68 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:02:03,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1399693.3333333333, ans=0.07 2023-12-24 00:02:12,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1399760.0, ans=0.125 2023-12-24 00:02:21,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-24 00:02:25,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1399826.6666666667, ans=0.125 2023-12-24 00:02:42,535 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.460e+01 3.868e+01 4.040e+01 4.212e+01 5.003e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:02:52,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.81 vs. limit=10.0 2023-12-24 00:02:53,881 INFO [train.py:886] (3/4) Epoch 45, batch 300, loss[loss=0.01268, audio_tagging_loss=0.01268, over 24750.00 frames. ], tot_loss[loss=0.01232, audio_tagging_loss=0.01232, over 3861386.87 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:01,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1400026.6666666667, ans=0.125 2023-12-24 00:03:04,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1400093.3333333333, ans=0.0 2023-12-24 00:03:09,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1400093.3333333333, ans=0.0 2023-12-24 00:03:10,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2023-12-24 00:03:17,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1400160.0, ans=0.0 2023-12-24 00:03:46,174 INFO [train.py:886] (3/4) Epoch 45, batch 350, loss[loss=0.01175, audio_tagging_loss=0.01175, over 24750.00 frames. ], tot_loss[loss=0.01209, audio_tagging_loss=0.01209, over 4097379.98 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 16.0 2023-12-24 00:03:50,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1400360.0, ans=0.125 2023-12-24 00:03:52,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1400360.0, ans=0.125 2023-12-24 00:04:07,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1400493.3333333333, ans=0.125 2023-12-24 00:04:13,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1400493.3333333333, ans=0.2 2023-12-24 00:04:21,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-12-24 00:04:21,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1400560.0, ans=0.0 2023-12-24 00:04:25,970 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.453e+01 3.862e+01 4.002e+01 4.189e+01 4.449e+01, threshold=8.005e+01, percent-clipped=0.0 2023-12-24 00:04:30,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1400626.6666666667, ans=0.125 2023-12-24 00:04:37,802 INFO [train.py:886] (3/4) Epoch 45, batch 400, loss[loss=0.01061, audio_tagging_loss=0.01061, over 25000.00 frames. ], tot_loss[loss=0.01186, audio_tagging_loss=0.01186, over 4284786.34 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:04:49,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-24 00:04:59,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-12-24 00:05:25,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1400960.0, ans=0.1 2023-12-24 00:05:28,529 INFO [train.py:886] (3/4) Epoch 45, batch 450, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01167, audio_tagging_loss=0.01167, over 4435475.51 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:05:48,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1401093.3333333333, ans=0.125 2023-12-24 00:06:08,958 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.831e+01 3.994e+01 4.191e+01 6.478e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:06:12,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2023-12-24 00:06:13,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1401293.3333333333, ans=0.125 2023-12-24 00:06:21,061 INFO [train.py:886] (3/4) Epoch 45, batch 500, loss[loss=0.007627, audio_tagging_loss=0.007627, over 22030.00 frames. ], tot_loss[loss=0.01142, audio_tagging_loss=0.01142, over 4547232.08 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:06:42,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1401493.3333333333, ans=0.125 2023-12-24 00:06:55,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1401560.0, ans=0.125 2023-12-24 00:07:00,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1401626.6666666667, ans=0.2 2023-12-24 00:07:10,608 INFO [train.py:886] (3/4) Epoch 45, batch 550, loss[loss=0.01522, audio_tagging_loss=0.01522, over 25000.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4635776.82 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:07:11,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1401693.3333333333, ans=0.125 2023-12-24 00:07:13,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1401693.3333333333, ans=0.125 2023-12-24 00:07:19,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2023-12-24 00:07:20,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1401760.0, ans=0.125 2023-12-24 00:07:27,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1401760.0, ans=0.0 2023-12-24 00:07:30,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1401826.6666666667, ans=0.2 2023-12-24 00:07:31,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1401826.6666666667, ans=10.0 2023-12-24 00:07:50,485 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.862e+01 4.012e+01 4.255e+01 6.507e+01, threshold=8.024e+01, percent-clipped=0.0 2023-12-24 00:07:53,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1401960.0, ans=0.2 2023-12-24 00:08:01,733 INFO [train.py:886] (3/4) Epoch 45, batch 600, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01148, audio_tagging_loss=0.01148, over 4708327.05 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:05,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1402026.6666666667, ans=0.0 2023-12-24 00:08:09,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1402026.6666666667, ans=0.0 2023-12-24 00:08:14,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1402093.3333333333, ans=0.125 2023-12-24 00:08:30,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1402160.0, ans=0.125 2023-12-24 00:08:34,271 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:08:46,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1402293.3333333333, ans=0.0 2023-12-24 00:08:54,248 INFO [train.py:886] (3/4) Epoch 45, batch 650, loss[loss=0.01189, audio_tagging_loss=0.01189, over 24750.00 frames. ], tot_loss[loss=0.01144, audio_tagging_loss=0.01144, over 4756659.65 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:08:54,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1402360.0, ans=0.2 2023-12-24 00:09:00,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1402360.0, ans=0.125 2023-12-24 00:09:05,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1402426.6666666667, ans=0.125 2023-12-24 00:09:15,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1402493.3333333333, ans=0.125 2023-12-24 00:09:18,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1402493.3333333333, ans=0.125 2023-12-24 00:09:23,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1402560.0, ans=0.125 2023-12-24 00:09:34,590 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.819e+01 4.030e+01 4.235e+01 4.740e+01, threshold=8.060e+01, percent-clipped=0.0 2023-12-24 00:09:45,208 INFO [train.py:886] (3/4) Epoch 45, batch 700, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01131, audio_tagging_loss=0.01131, over 4795919.33 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:10:01,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1402760.0, ans=0.1 2023-12-24 00:10:08,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1402826.6666666667, ans=0.0 2023-12-24 00:10:11,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1402826.6666666667, ans=0.125 2023-12-24 00:10:14,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1402826.6666666667, ans=0.125 2023-12-24 00:10:17,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1402893.3333333333, ans=0.125 2023-12-24 00:10:19,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1402893.3333333333, ans=0.0 2023-12-24 00:10:25,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1402960.0, ans=0.1 2023-12-24 00:10:26,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.95 vs. limit=10.0 2023-12-24 00:10:37,461 INFO [train.py:886] (3/4) Epoch 45, batch 750, loss[loss=0.01249, audio_tagging_loss=0.01249, over 24750.00 frames. ], tot_loss[loss=0.01122, audio_tagging_loss=0.01122, over 4831545.39 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:11:13,063 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:11:15,687 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.861e+01 4.093e+01 4.221e+01 4.879e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:11:18,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1403293.3333333333, ans=0.125 2023-12-24 00:11:18,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1403293.3333333333, ans=0.125 2023-12-24 00:11:21,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-12-24 00:11:26,758 INFO [train.py:886] (3/4) Epoch 45, batch 800, loss[loss=0.008774, audio_tagging_loss=0.008774, over 24023.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4858004.90 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:11:29,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1403360.0, ans=0.04949747468305833 2023-12-24 00:11:35,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1403360.0, ans=0.125 2023-12-24 00:11:46,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1403493.3333333333, ans=0.1 2023-12-24 00:11:46,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-12-24 00:11:53,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1403493.3333333333, ans=0.0 2023-12-24 00:11:59,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-12-24 00:12:11,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1403626.6666666667, ans=0.125 2023-12-24 00:12:18,618 INFO [train.py:886] (3/4) Epoch 45, batch 850, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4880938.06 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:12:22,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1403693.3333333333, ans=0.125 2023-12-24 00:12:35,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-12-24 00:12:38,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.78 vs. limit=10.0 2023-12-24 00:12:38,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1403826.6666666667, ans=0.125 2023-12-24 00:12:39,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1403826.6666666667, ans=0.125 2023-12-24 00:12:43,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1403826.6666666667, ans=0.1 2023-12-24 00:12:50,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1403893.3333333333, ans=0.0 2023-12-24 00:12:52,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1403893.3333333333, ans=0.0 2023-12-24 00:12:58,728 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.415e+01 3.861e+01 4.035e+01 4.222e+01 4.797e+01, threshold=8.070e+01, percent-clipped=0.0 2023-12-24 00:13:00,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1403960.0, ans=0.0 2023-12-24 00:13:01,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-12-24 00:13:11,515 INFO [train.py:886] (3/4) Epoch 45, batch 900, loss[loss=0.01207, audio_tagging_loss=0.01207, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4901826.53 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:02,248 INFO [train.py:886] (3/4) Epoch 45, batch 950, loss[loss=0.01134, audio_tagging_loss=0.01134, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4904555.37 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:07,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1404360.0, ans=0.125 2023-12-24 00:14:18,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2023-12-24 00:14:21,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-12-24 00:14:22,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1404493.3333333333, ans=0.125 2023-12-24 00:14:43,643 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 3.902e+01 4.041e+01 4.266e+01 4.782e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 00:14:44,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=1404626.6666666667, ans=10.0 2023-12-24 00:14:47,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1404626.6666666667, ans=0.0 2023-12-24 00:14:54,680 INFO [train.py:886] (3/4) Epoch 45, batch 1000, loss[loss=0.008842, audio_tagging_loss=0.008842, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4912700.13 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:14:55,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1404693.3333333333, ans=0.0 2023-12-24 00:15:30,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1404893.3333333333, ans=0.125 2023-12-24 00:15:37,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1404960.0, ans=0.125 2023-12-24 00:15:45,805 INFO [train.py:886] (3/4) Epoch 45, batch 1050, loss[loss=0.01123, audio_tagging_loss=0.01123, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4922036.62 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:15:55,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.57 vs. limit=15.0 2023-12-24 00:16:08,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.03 vs. limit=10.0 2023-12-24 00:16:11,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1405160.0, ans=0.125 2023-12-24 00:16:19,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1405226.6666666667, ans=0.125 2023-12-24 00:16:24,811 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.844e+01 4.005e+01 4.224e+01 5.202e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 00:16:30,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1405293.3333333333, ans=0.125 2023-12-24 00:16:36,161 INFO [train.py:886] (3/4) Epoch 45, batch 1100, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4930938.85 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:16:37,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1405360.0, ans=0.07 2023-12-24 00:16:52,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1405426.6666666667, ans=0.0 2023-12-24 00:17:03,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1405493.3333333333, ans=0.125 2023-12-24 00:17:05,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1405560.0, ans=0.0 2023-12-24 00:17:07,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1405560.0, ans=0.0 2023-12-24 00:17:11,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1405560.0, ans=0.125 2023-12-24 00:17:21,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1405626.6666666667, ans=0.125 2023-12-24 00:17:27,001 INFO [train.py:886] (3/4) Epoch 45, batch 1150, loss[loss=0.009672, audio_tagging_loss=0.009672, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4932979.60 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:17:32,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2023-12-24 00:17:34,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1405693.3333333333, ans=0.125 2023-12-24 00:17:36,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1405760.0, ans=0.2 2023-12-24 00:17:37,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1405760.0, ans=0.125 2023-12-24 00:17:57,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2023-12-24 00:17:58,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-12-24 00:18:05,975 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.457e+01 3.770e+01 3.983e+01 4.144e+01 4.747e+01, threshold=7.965e+01, percent-clipped=0.0 2023-12-24 00:18:06,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=1405960.0, ans=0.5 2023-12-24 00:18:17,353 INFO [train.py:886] (3/4) Epoch 45, batch 1200, loss[loss=0.01122, audio_tagging_loss=0.01122, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4946668.98 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:18:21,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1406026.6666666667, ans=0.125 2023-12-24 00:18:24,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406026.6666666667, ans=0.1 2023-12-24 00:18:55,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1406226.6666666667, ans=0.0 2023-12-24 00:18:56,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1406226.6666666667, ans=0.125 2023-12-24 00:19:09,375 INFO [train.py:886] (3/4) Epoch 45, batch 1250, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4941960.04 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:19:19,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-12-24 00:19:36,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2023-12-24 00:19:49,851 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.464e+01 3.882e+01 4.077e+01 4.282e+01 6.825e+01, threshold=8.153e+01, percent-clipped=0.0 2023-12-24 00:19:55,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1406626.6666666667, ans=0.125 2023-12-24 00:19:58,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-12-24 00:20:00,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1406626.6666666667, ans=0.2 2023-12-24 00:20:02,546 INFO [train.py:886] (3/4) Epoch 45, batch 1300, loss[loss=0.009911, audio_tagging_loss=0.009911, over 24750.00 frames. ], tot_loss[loss=0.01128, audio_tagging_loss=0.01128, over 4939040.33 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:20:53,368 INFO [train.py:886] (3/4) Epoch 45, batch 1350, loss[loss=0.009742, audio_tagging_loss=0.009742, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4941849.78 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:20:53,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1407026.6666666667, ans=0.125 2023-12-24 00:21:13,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1407160.0, ans=0.125 2023-12-24 00:21:21,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1407160.0, ans=0.125 2023-12-24 00:21:34,829 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.816e+01 3.963e+01 4.132e+01 5.053e+01, threshold=7.926e+01, percent-clipped=0.0 2023-12-24 00:21:45,959 INFO [train.py:886] (3/4) Epoch 45, batch 1400, loss[loss=0.01028, audio_tagging_loss=0.01028, over 24040.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4942642.26 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:21:53,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1407360.0, ans=0.125 2023-12-24 00:22:00,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1407426.6666666667, ans=0.125 2023-12-24 00:22:21,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1407560.0, ans=0.125 2023-12-24 00:22:38,203 INFO [train.py:886] (3/4) Epoch 45, batch 1450, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4950398.05 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:22:53,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:23:06,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1407826.6666666667, ans=0.0 2023-12-24 00:23:10,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1407893.3333333333, ans=0.125 2023-12-24 00:23:16,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1407893.3333333333, ans=0.0 2023-12-24 00:23:18,723 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.497e+01 3.851e+01 3.995e+01 4.151e+01 4.657e+01, threshold=7.989e+01, percent-clipped=0.0 2023-12-24 00:23:29,319 INFO [train.py:886] (3/4) Epoch 45, batch 1500, loss[loss=0.009566, audio_tagging_loss=0.009566, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4958357.72 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:23:29,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1408026.6666666667, ans=0.2 2023-12-24 00:23:34,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1408026.6666666667, ans=0.125 2023-12-24 00:23:48,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-12-24 00:24:22,036 INFO [train.py:886] (3/4) Epoch 45, batch 1550, loss[loss=0.009929, audio_tagging_loss=0.009929, over 24089.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4957730.99 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:24:23,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1408360.0, ans=0.125 2023-12-24 00:24:30,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-12-24 00:24:40,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-12-24 00:24:40,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1408493.3333333333, ans=0.1 2023-12-24 00:24:50,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1408493.3333333333, ans=15.0 2023-12-24 00:25:01,864 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.911e+01 4.058e+01 4.249e+01 4.989e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 00:25:13,074 INFO [train.py:886] (3/4) Epoch 45, batch 1600, loss[loss=0.008947, audio_tagging_loss=0.008947, over 24750.00 frames. ], tot_loss[loss=0.01116, audio_tagging_loss=0.01116, over 4955765.22 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:25:25,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1408760.0, ans=0.125 2023-12-24 00:25:43,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1408893.3333333333, ans=0.125 2023-12-24 00:25:52,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1408893.3333333333, ans=0.125 2023-12-24 00:25:53,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1408893.3333333333, ans=0.0 2023-12-24 00:25:57,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1408960.0, ans=0.0 2023-12-24 00:25:57,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1408960.0, ans=0.04949747468305833 2023-12-24 00:26:05,274 INFO [train.py:886] (3/4) Epoch 45, batch 1650, loss[loss=0.00955, audio_tagging_loss=0.00955, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4954232.30 frames. ], batch size: 99, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:26:11,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1409026.6666666667, ans=0.2 2023-12-24 00:26:15,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1409093.3333333333, ans=0.0 2023-12-24 00:26:32,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2023-12-24 00:26:45,326 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.532e+01 3.821e+01 4.016e+01 4.279e+01 4.895e+01, threshold=8.031e+01, percent-clipped=0.0 2023-12-24 00:26:58,115 INFO [train.py:886] (3/4) Epoch 45, batch 1700, loss[loss=0.01186, audio_tagging_loss=0.01186, over 21798.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4951426.18 frames. ], batch size: 107, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:27:00,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-12-24 00:27:03,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1409360.0, ans=0.125 2023-12-24 00:27:03,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-12-24 00:27:25,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1409493.3333333333, ans=0.125 2023-12-24 00:27:49,109 INFO [train.py:886] (3/4) Epoch 45, batch 1750, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4951189.83 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:27:49,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1409693.3333333333, ans=0.1 2023-12-24 00:27:50,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-12-24 00:27:58,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-12-24 00:28:02,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1409760.0, ans=0.2 2023-12-24 00:28:08,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1409760.0, ans=0.125 2023-12-24 00:28:11,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1409826.6666666667, ans=0.0 2023-12-24 00:28:29,628 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.438e+01 3.800e+01 3.993e+01 4.173e+01 4.854e+01, threshold=7.987e+01, percent-clipped=0.0 2023-12-24 00:28:30,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1409960.0, ans=0.125 2023-12-24 00:28:31,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-12-24 00:28:42,190 INFO [train.py:886] (3/4) Epoch 45, batch 1800, loss[loss=0.01467, audio_tagging_loss=0.01467, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4956972.56 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:28:47,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1410026.6666666667, ans=0.125 2023-12-24 00:28:49,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1410026.6666666667, ans=0.0 2023-12-24 00:28:49,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1410026.6666666667, ans=0.09899494936611666 2023-12-24 00:28:53,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1410093.3333333333, ans=0.125 2023-12-24 00:28:55,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1410093.3333333333, ans=0.0 2023-12-24 00:28:58,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1410093.3333333333, ans=0.125 2023-12-24 00:29:07,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1410160.0, ans=0.0 2023-12-24 00:29:23,168 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:29:32,416 INFO [train.py:886] (3/4) Epoch 45, batch 1850, loss[loss=0.01219, audio_tagging_loss=0.01219, over 24946.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4958863.09 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 32.0 2023-12-24 00:29:58,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-12-24 00:30:14,389 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.493e+01 3.903e+01 4.080e+01 4.257e+01 5.183e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 00:30:22,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1410626.6666666667, ans=0.125 2023-12-24 00:30:24,984 INFO [train.py:886] (3/4) Epoch 45, batch 1900, loss[loss=0.01204, audio_tagging_loss=0.01204, over 24750.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4957915.97 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:30:38,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-24 00:30:47,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1410826.6666666667, ans=0.125 2023-12-24 00:30:55,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1410893.3333333333, ans=0.125 2023-12-24 00:30:57,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1410893.3333333333, ans=0.1 2023-12-24 00:31:00,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2023-12-24 00:31:16,972 INFO [train.py:886] (3/4) Epoch 45, batch 1950, loss[loss=0.008735, audio_tagging_loss=0.008735, over 25000.00 frames. ], tot_loss[loss=0.01117, audio_tagging_loss=0.01117, over 4950428.77 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:31:40,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411160.0, ans=0.0 2023-12-24 00:31:56,106 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 3.819e+01 3.949e+01 4.162e+01 4.750e+01, threshold=7.898e+01, percent-clipped=0.0 2023-12-24 00:32:05,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1411293.3333333333, ans=0.035 2023-12-24 00:32:06,738 INFO [train.py:886] (3/4) Epoch 45, batch 2000, loss[loss=0.009607, audio_tagging_loss=0.009607, over 25000.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4949118.66 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 32.0 2023-12-24 00:32:32,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411493.3333333333, ans=0.0 2023-12-24 00:32:59,695 INFO [train.py:886] (3/4) Epoch 45, batch 2050, loss[loss=0.00931, audio_tagging_loss=0.00931, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4948949.76 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:06,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1411693.3333333333, ans=0.125 2023-12-24 00:33:28,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1411826.6666666667, ans=0.95 2023-12-24 00:33:29,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411826.6666666667, ans=0.0 2023-12-24 00:33:39,578 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.466e+01 3.826e+01 3.966e+01 4.166e+01 5.288e+01, threshold=7.932e+01, percent-clipped=0.0 2023-12-24 00:33:40,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-12-24 00:33:51,011 INFO [train.py:886] (3/4) Epoch 45, batch 2100, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4957300.51 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:33:54,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1412026.6666666667, ans=0.2 2023-12-24 00:33:56,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1412026.6666666667, ans=0.125 2023-12-24 00:34:11,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1412160.0, ans=0.125 2023-12-24 00:34:16,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2023-12-24 00:34:39,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1412293.3333333333, ans=0.0 2023-12-24 00:34:40,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1412293.3333333333, ans=0.02 2023-12-24 00:34:40,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1412293.3333333333, ans=0.2 2023-12-24 00:34:43,223 INFO [train.py:886] (3/4) Epoch 45, batch 2150, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4959376.49 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:34:51,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1412426.6666666667, ans=0.0 2023-12-24 00:34:51,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1412426.6666666667, ans=0.125 2023-12-24 00:35:03,362 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:35:10,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1412493.3333333333, ans=0.125 2023-12-24 00:35:22,846 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.885e+01 4.074e+01 4.292e+01 5.969e+01, threshold=8.149e+01, percent-clipped=0.0 2023-12-24 00:35:23,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1412626.6666666667, ans=0.0 2023-12-24 00:35:32,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1412626.6666666667, ans=0.125 2023-12-24 00:35:34,959 INFO [train.py:886] (3/4) Epoch 45, batch 2200, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4957404.35 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:35:36,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1412693.3333333333, ans=0.0 2023-12-24 00:35:40,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1412693.3333333333, ans=0.1 2023-12-24 00:35:40,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1412693.3333333333, ans=0.0 2023-12-24 00:35:41,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1412693.3333333333, ans=0.125 2023-12-24 00:35:42,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-12-24 00:35:46,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1412760.0, ans=0.2 2023-12-24 00:35:52,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1412760.0, ans=0.125 2023-12-24 00:35:53,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1412826.6666666667, ans=0.0 2023-12-24 00:35:54,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1412826.6666666667, ans=0.025 2023-12-24 00:35:58,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-12-24 00:36:00,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1412826.6666666667, ans=0.0 2023-12-24 00:36:08,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1412893.3333333333, ans=0.125 2023-12-24 00:36:14,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1412893.3333333333, ans=0.1 2023-12-24 00:36:18,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2023-12-24 00:36:25,233 INFO [train.py:886] (3/4) Epoch 45, batch 2250, loss[loss=0.01013, audio_tagging_loss=0.01013, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4954877.92 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:36:52,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1413160.0, ans=0.125 2023-12-24 00:36:52,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1413160.0, ans=0.0 2023-12-24 00:37:00,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1413226.6666666667, ans=0.125 2023-12-24 00:37:06,938 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.413e+01 3.863e+01 4.040e+01 4.242e+01 4.611e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 00:37:09,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1413293.3333333333, ans=0.0 2023-12-24 00:37:16,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1413293.3333333333, ans=0.125 2023-12-24 00:37:20,256 INFO [train.py:886] (3/4) Epoch 45, batch 2300, loss[loss=0.01017, audio_tagging_loss=0.01017, over 22538.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4950682.10 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:37:23,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1413360.0, ans=0.0 2023-12-24 00:37:50,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1413560.0, ans=0.125 2023-12-24 00:37:53,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2023-12-24 00:38:12,029 INFO [train.py:886] (3/4) Epoch 45, batch 2350, loss[loss=0.008855, audio_tagging_loss=0.008855, over 25000.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4955759.66 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:38:16,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1413693.3333333333, ans=0.2 2023-12-24 00:38:25,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1413760.0, ans=0.125 2023-12-24 00:38:30,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1413826.6666666667, ans=0.125 2023-12-24 00:38:51,850 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.613e+01 3.856e+01 3.996e+01 4.169e+01 4.627e+01, threshold=7.993e+01, percent-clipped=0.0 2023-12-24 00:38:55,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1413960.0, ans=0.0 2023-12-24 00:39:02,981 INFO [train.py:886] (3/4) Epoch 45, batch 2400, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4951385.39 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:39:06,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1414026.6666666667, ans=0.125 2023-12-24 00:39:09,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1414026.6666666667, ans=0.125 2023-12-24 00:39:12,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1414093.3333333333, ans=0.0 2023-12-24 00:39:16,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1414093.3333333333, ans=0.0 2023-12-24 00:39:27,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1414160.0, ans=0.0 2023-12-24 00:39:36,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1414226.6666666667, ans=0.125 2023-12-24 00:39:54,293 INFO [train.py:886] (3/4) Epoch 45, batch 2450, loss[loss=0.01246, audio_tagging_loss=0.01246, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4950551.64 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:39:57,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1414360.0, ans=0.09899494936611666 2023-12-24 00:40:03,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=12.0 2023-12-24 00:40:12,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1414426.6666666667, ans=0.125 2023-12-24 00:40:14,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2023-12-24 00:40:20,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-12-24 00:40:33,308 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.484e+01 3.936e+01 4.079e+01 4.274e+01 6.379e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 00:40:41,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2023-12-24 00:40:44,581 INFO [train.py:886] (3/4) Epoch 45, batch 2500, loss[loss=0.01128, audio_tagging_loss=0.01128, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4945853.28 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:40:44,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1414693.3333333333, ans=0.0 2023-12-24 00:40:58,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1414760.0, ans=0.125 2023-12-24 00:41:03,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1414760.0, ans=0.125 2023-12-24 00:41:26,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1414960.0, ans=0.125 2023-12-24 00:41:26,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-12-24 00:41:36,874 INFO [train.py:886] (3/4) Epoch 45, batch 2550, loss[loss=0.009749, audio_tagging_loss=0.009749, over 24074.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4937366.42 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:41:38,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-12-24 00:41:39,676 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:41:41,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1415026.6666666667, ans=0.2 2023-12-24 00:41:49,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-12-24 00:41:49,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-12-24 00:41:54,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1415093.3333333333, ans=0.0 2023-12-24 00:42:12,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1415226.6666666667, ans=0.125 2023-12-24 00:42:15,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1415226.6666666667, ans=0.125 2023-12-24 00:42:17,203 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.969e+01 4.108e+01 4.307e+01 5.107e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 00:42:28,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1415293.3333333333, ans=0.0 2023-12-24 00:42:29,781 INFO [train.py:886] (3/4) Epoch 45, batch 2600, loss[loss=0.01058, audio_tagging_loss=0.01058, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4934308.00 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:42:57,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1415493.3333333333, ans=0.015 2023-12-24 00:43:04,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1415560.0, ans=15.0 2023-12-24 00:43:09,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1415560.0, ans=0.5 2023-12-24 00:43:10,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1415626.6666666667, ans=0.0 2023-12-24 00:43:20,985 INFO [train.py:886] (3/4) Epoch 45, batch 2650, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4938943.12 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:43:27,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-12-24 00:43:29,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1415693.3333333333, ans=0.1 2023-12-24 00:43:30,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=12.0 2023-12-24 00:43:44,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2023-12-24 00:43:49,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1415826.6666666667, ans=0.125 2023-12-24 00:43:57,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1415893.3333333333, ans=0.0 2023-12-24 00:43:57,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1415893.3333333333, ans=0.1 2023-12-24 00:44:02,333 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.298e+01 3.813e+01 3.941e+01 4.164e+01 4.704e+01, threshold=7.881e+01, percent-clipped=0.0 2023-12-24 00:44:13,726 INFO [train.py:886] (3/4) Epoch 45, batch 2700, loss[loss=0.01105, audio_tagging_loss=0.01105, over 21650.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4941494.57 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:44:37,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-12-24 00:44:38,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1416160.0, ans=0.0 2023-12-24 00:44:43,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1416160.0, ans=0.0 2023-12-24 00:44:53,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1416293.3333333333, ans=0.125 2023-12-24 00:45:05,366 INFO [train.py:886] (3/4) Epoch 45, batch 2750, loss[loss=0.008907, audio_tagging_loss=0.008907, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4944336.56 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:45:07,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1416360.0, ans=0.0 2023-12-24 00:45:25,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1416493.3333333333, ans=0.125 2023-12-24 00:45:28,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1416493.3333333333, ans=0.125 2023-12-24 00:45:29,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1416493.3333333333, ans=0.1 2023-12-24 00:45:30,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1416493.3333333333, ans=0.2 2023-12-24 00:45:30,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-12-24 00:45:31,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1416493.3333333333, ans=0.1 2023-12-24 00:45:43,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1416560.0, ans=0.125 2023-12-24 00:45:45,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1416560.0, ans=0.125 2023-12-24 00:45:46,095 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.826e+01 4.053e+01 4.238e+01 4.704e+01, threshold=8.107e+01, percent-clipped=0.0 2023-12-24 00:45:49,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2023-12-24 00:45:56,552 INFO [train.py:886] (3/4) Epoch 45, batch 2800, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4947828.44 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:46:06,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1416760.0, ans=0.0 2023-12-24 00:46:21,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1416826.6666666667, ans=0.07 2023-12-24 00:46:23,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1416826.6666666667, ans=0.2 2023-12-24 00:46:30,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1416893.3333333333, ans=0.1 2023-12-24 00:46:30,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1416893.3333333333, ans=0.125 2023-12-24 00:46:49,809 INFO [train.py:886] (3/4) Epoch 45, batch 2850, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4941193.59 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:46:51,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1417026.6666666667, ans=0.1 2023-12-24 00:47:04,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-12-24 00:47:19,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1417160.0, ans=10.0 2023-12-24 00:47:22,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-12-24 00:47:27,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1417226.6666666667, ans=0.0 2023-12-24 00:47:29,709 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.915e+01 4.123e+01 4.264e+01 5.597e+01, threshold=8.246e+01, percent-clipped=0.0 2023-12-24 00:47:35,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1417293.3333333333, ans=0.125 2023-12-24 00:47:35,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1417293.3333333333, ans=0.125 2023-12-24 00:47:40,340 INFO [train.py:886] (3/4) Epoch 45, batch 2900, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4936667.76 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:48:07,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1417493.3333333333, ans=0.0 2023-12-24 00:48:22,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1417626.6666666667, ans=0.125 2023-12-24 00:48:32,212 INFO [train.py:886] (3/4) Epoch 45, batch 2950, loss[loss=0.01268, audio_tagging_loss=0.01268, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4938082.05 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:48:39,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1417693.3333333333, ans=0.0 2023-12-24 00:48:51,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1417760.0, ans=0.0 2023-12-24 00:49:11,468 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.805e+01 3.987e+01 4.227e+01 4.629e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 00:49:23,964 INFO [train.py:886] (3/4) Epoch 45, batch 3000, loss[loss=0.01197, audio_tagging_loss=0.01197, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4943367.43 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:49:23,964 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 00:49:37,024 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3592, 4.6468, 5.2443, 4.7621], device='cuda:3') 2023-12-24 00:49:45,379 INFO [train.py:917] (3/4) Epoch 45, validation: loss=0.03669, audio_tagging_loss=0.03669, over 3737520.00 frames. 2023-12-24 00:49:45,379 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 00:49:47,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1418026.6666666667, ans=0.07 2023-12-24 00:49:47,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.43 vs. limit=10.0 2023-12-24 00:49:52,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2023-12-24 00:49:59,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-12-24 00:49:59,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2023-12-24 00:50:05,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-12-24 00:50:34,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1418293.3333333333, ans=0.125 2023-12-24 00:50:35,391 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:50:36,122 INFO [train.py:886] (3/4) Epoch 45, batch 3050, loss[loss=0.01322, audio_tagging_loss=0.01322, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4947494.04 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:51:07,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1418560.0, ans=0.2 2023-12-24 00:51:15,235 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.857e+01 4.011e+01 4.213e+01 4.793e+01, threshold=8.022e+01, percent-clipped=0.0 2023-12-24 00:51:25,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1418626.6666666667, ans=0.125 2023-12-24 00:51:28,648 INFO [train.py:886] (3/4) Epoch 45, batch 3100, loss[loss=0.009528, audio_tagging_loss=0.009528, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4948086.52 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:51:37,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1418760.0, ans=0.125 2023-12-24 00:51:40,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1418760.0, ans=0.125 2023-12-24 00:51:48,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-12-24 00:52:00,181 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:52:18,791 INFO [train.py:886] (3/4) Epoch 45, batch 3150, loss[loss=0.01115, audio_tagging_loss=0.01115, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4946166.23 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:52:27,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-12-24 00:52:36,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1419093.3333333333, ans=0.2 2023-12-24 00:52:39,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1419160.0, ans=0.2 2023-12-24 00:52:52,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.38 vs. limit=10.0 2023-12-24 00:52:54,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2023-12-24 00:53:00,683 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 3.938e+01 4.100e+01 4.263e+01 4.919e+01, threshold=8.199e+01, percent-clipped=0.0 2023-12-24 00:53:05,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-12-24 00:53:11,857 INFO [train.py:886] (3/4) Epoch 45, batch 3200, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4941785.41 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:53:16,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=12.0 2023-12-24 00:53:17,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1419360.0, ans=0.0 2023-12-24 00:53:36,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-12-24 00:53:54,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1419626.6666666667, ans=0.2 2023-12-24 00:54:03,200 INFO [train.py:886] (3/4) Epoch 45, batch 3250, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4938948.41 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:07,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1419693.3333333333, ans=0.125 2023-12-24 00:54:43,801 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.834e+01 4.007e+01 4.238e+01 5.618e+01, threshold=8.014e+01, percent-clipped=0.0 2023-12-24 00:54:49,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1419960.0, ans=0.125 2023-12-24 00:54:54,627 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:54:55,290 INFO [train.py:886] (3/4) Epoch 45, batch 3300, loss[loss=0.009909, audio_tagging_loss=0.009909, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4938436.38 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:54:55,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1420026.6666666667, ans=0.2 2023-12-24 00:55:00,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1420026.6666666667, ans=0.2 2023-12-24 00:55:11,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1420093.3333333333, ans=0.2 2023-12-24 00:55:11,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1420093.3333333333, ans=0.05 2023-12-24 00:55:35,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1420293.3333333333, ans=0.0 2023-12-24 00:55:39,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1420293.3333333333, ans=0.125 2023-12-24 00:55:46,836 INFO [train.py:886] (3/4) Epoch 45, batch 3350, loss[loss=0.009856, audio_tagging_loss=0.009856, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4942007.27 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:56:00,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1420426.6666666667, ans=0.125 2023-12-24 00:56:03,927 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:56:09,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1420493.3333333333, ans=0.2 2023-12-24 00:56:17,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1420560.0, ans=0.07 2023-12-24 00:56:26,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.924e+01 4.094e+01 4.263e+01 4.631e+01, threshold=8.187e+01, percent-clipped=0.0 2023-12-24 00:56:34,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1420626.6666666667, ans=0.0 2023-12-24 00:56:36,978 INFO [train.py:886] (3/4) Epoch 45, batch 3400, loss[loss=0.01007, audio_tagging_loss=0.01007, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4948405.36 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:57:13,875 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 00:57:14,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1420893.3333333333, ans=0.0 2023-12-24 00:57:29,307 INFO [train.py:886] (3/4) Epoch 45, batch 3450, loss[loss=0.01431, audio_tagging_loss=0.01431, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4942489.59 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:57:53,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1421160.0, ans=15.0 2023-12-24 00:58:04,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1421226.6666666667, ans=0.0 2023-12-24 00:58:08,512 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.963e+01 4.132e+01 4.315e+01 4.821e+01, threshold=8.264e+01, percent-clipped=0.0 2023-12-24 00:58:20,529 INFO [train.py:886] (3/4) Epoch 45, batch 3500, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4943573.91 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:58:20,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1421360.0, ans=0.0 2023-12-24 00:58:49,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1421493.3333333333, ans=0.0 2023-12-24 00:58:49,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-12-24 00:58:59,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1421626.6666666667, ans=0.0 2023-12-24 00:59:01,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1421626.6666666667, ans=0.2 2023-12-24 00:59:02,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2023-12-24 00:59:09,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1421626.6666666667, ans=0.1 2023-12-24 00:59:10,994 INFO [train.py:886] (3/4) Epoch 45, batch 3550, loss[loss=0.009707, audio_tagging_loss=0.009707, over 25000.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4946857.78 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 00:59:12,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=1421693.3333333333, ans=0.1 2023-12-24 00:59:24,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1421760.0, ans=0.09899494936611666 2023-12-24 00:59:27,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1421760.0, ans=0.125 2023-12-24 00:59:46,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=8.0 2023-12-24 00:59:50,931 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.379e+01 3.787e+01 4.000e+01 4.230e+01 4.921e+01, threshold=7.999e+01, percent-clipped=0.0 2023-12-24 00:59:54,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-12-24 00:59:55,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1421960.0, ans=0.125 2023-12-24 01:00:02,142 INFO [train.py:886] (3/4) Epoch 45, batch 3600, loss[loss=0.009475, audio_tagging_loss=0.009475, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4948891.26 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:23,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1422160.0, ans=0.125 2023-12-24 01:00:34,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1422226.6666666667, ans=0.2 2023-12-24 01:00:35,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1422226.6666666667, ans=0.1 2023-12-24 01:00:43,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1422293.3333333333, ans=0.0 2023-12-24 01:00:53,735 INFO [train.py:886] (3/4) Epoch 45, batch 3650, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4948976.27 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 64.0 2023-12-24 01:00:57,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=12.0 2023-12-24 01:01:11,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1422426.6666666667, ans=0.1 2023-12-24 01:01:13,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1422493.3333333333, ans=0.125 2023-12-24 01:01:15,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1422493.3333333333, ans=0.0 2023-12-24 01:01:24,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1422560.0, ans=0.125 2023-12-24 01:01:34,296 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.468e+01 3.852e+01 3.987e+01 4.174e+01 4.561e+01, threshold=7.974e+01, percent-clipped=0.0 2023-12-24 01:01:40,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1422626.6666666667, ans=0.125 2023-12-24 01:01:41,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-12-24 01:01:43,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2023-12-24 01:01:44,932 INFO [train.py:886] (3/4) Epoch 45, batch 3700, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4956504.82 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:01:57,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1422760.0, ans=0.0 2023-12-24 01:02:05,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1422826.6666666667, ans=0.0 2023-12-24 01:02:15,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1422893.3333333333, ans=0.125 2023-12-24 01:02:30,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1422960.0, ans=0.125 2023-12-24 01:02:37,428 INFO [train.py:886] (3/4) Epoch 45, batch 3750, loss[loss=0.01281, audio_tagging_loss=0.01281, over 25000.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4950306.21 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:02:45,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1423026.6666666667, ans=0.2 2023-12-24 01:03:05,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-12-24 01:03:09,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2023-12-24 01:03:16,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1423226.6666666667, ans=0.125 2023-12-24 01:03:17,402 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.344e+01 3.874e+01 4.070e+01 4.272e+01 4.635e+01, threshold=8.140e+01, percent-clipped=0.0 2023-12-24 01:03:28,534 INFO [train.py:886] (3/4) Epoch 45, batch 3800, loss[loss=0.01023, audio_tagging_loss=0.01023, over 24048.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4943462.81 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:03:35,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1423360.0, ans=0.05 2023-12-24 01:04:19,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1423626.6666666667, ans=0.0 2023-12-24 01:04:20,876 INFO [train.py:886] (3/4) Epoch 45, batch 3850, loss[loss=0.01193, audio_tagging_loss=0.01193, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4944604.78 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:04:59,990 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.544e+01 3.849e+01 4.042e+01 4.188e+01 4.936e+01, threshold=8.085e+01, percent-clipped=0.0 2023-12-24 01:05:12,418 INFO [train.py:886] (3/4) Epoch 45, batch 3900, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4945370.97 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:05:13,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1424026.6666666667, ans=0.125 2023-12-24 01:05:14,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1424026.6666666667, ans=0.125 2023-12-24 01:05:43,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1424226.6666666667, ans=0.125 2023-12-24 01:05:47,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1424226.6666666667, ans=0.1 2023-12-24 01:06:01,964 INFO [train.py:886] (3/4) Epoch 45, batch 3950, loss[loss=0.009742, audio_tagging_loss=0.009742, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4950920.46 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:02,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1424360.0, ans=0.0 2023-12-24 01:06:06,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2023-12-24 01:06:24,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1424493.3333333333, ans=0.0 2023-12-24 01:06:24,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1424493.3333333333, ans=0.07 2023-12-24 01:06:33,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424560.0, ans=0.1 2023-12-24 01:06:39,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-12-24 01:06:42,076 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.512e+01 3.875e+01 4.019e+01 4.169e+01 5.128e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:06:53,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1424693.3333333333, ans=0.125 2023-12-24 01:06:53,883 INFO [train.py:886] (3/4) Epoch 45, batch 4000, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4955092.29 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:06:55,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1424693.3333333333, ans=0.5 2023-12-24 01:06:56,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1424693.3333333333, ans=0.0 2023-12-24 01:06:56,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1424693.3333333333, ans=0.125 2023-12-24 01:06:57,909 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:06:59,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1424693.3333333333, ans=0.05 2023-12-24 01:07:01,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424693.3333333333, ans=0.1 2023-12-24 01:07:11,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1424760.0, ans=0.125 2023-12-24 01:07:20,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1424826.6666666667, ans=0.125 2023-12-24 01:07:30,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1424893.3333333333, ans=0.0 2023-12-24 01:07:32,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1424893.3333333333, ans=0.125 2023-12-24 01:07:33,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1424960.0, ans=0.125 2023-12-24 01:07:43,263 INFO [train.py:886] (3/4) Epoch 45, batch 4050, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4953323.34 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:07:49,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1425026.6666666667, ans=0.125 2023-12-24 01:07:54,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1425093.3333333333, ans=0.2 2023-12-24 01:07:59,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-12-24 01:08:06,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1425160.0, ans=0.125 2023-12-24 01:08:07,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1425160.0, ans=0.1 2023-12-24 01:08:20,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1425226.6666666667, ans=0.125 2023-12-24 01:08:25,214 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.455e+01 3.868e+01 4.107e+01 4.296e+01 5.422e+01, threshold=8.214e+01, percent-clipped=0.0 2023-12-24 01:08:29,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1425293.3333333333, ans=0.125 2023-12-24 01:08:34,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1425360.0, ans=0.0 2023-12-24 01:08:34,914 INFO [train.py:886] (3/4) Epoch 45, batch 4100, loss[loss=0.01243, audio_tagging_loss=0.01243, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4946119.36 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:09:27,202 INFO [train.py:886] (3/4) Epoch 45, batch 4150, loss[loss=0.01255, audio_tagging_loss=0.01255, over 24750.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4949059.06 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 64.0 2023-12-24 01:09:29,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2023-12-24 01:09:55,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1425826.6666666667, ans=0.2 2023-12-24 01:10:05,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-12-24 01:10:08,339 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.433e+01 3.920e+01 4.056e+01 4.290e+01 4.928e+01, threshold=8.113e+01, percent-clipped=0.0 2023-12-24 01:10:11,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1425960.0, ans=0.07 2023-12-24 01:10:16,929 INFO [train.py:886] (3/4) Epoch 45, batch 4200, loss[loss=0.01094, audio_tagging_loss=0.01094, over 24068.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4946555.03 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:10:27,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1426093.3333333333, ans=0.1 2023-12-24 01:10:49,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1426226.6666666667, ans=0.125 2023-12-24 01:10:52,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1426226.6666666667, ans=0.0 2023-12-24 01:11:00,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1426293.3333333333, ans=0.2 2023-12-24 01:11:02,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426293.3333333333, ans=0.125 2023-12-24 01:11:02,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-12-24 01:11:08,471 INFO [train.py:886] (3/4) Epoch 45, batch 4250, loss[loss=0.009501, audio_tagging_loss=0.009501, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4943648.50 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:11:09,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1426360.0, ans=0.0 2023-12-24 01:11:10,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1426360.0, ans=0.125 2023-12-24 01:11:23,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1426426.6666666667, ans=0.0 2023-12-24 01:11:25,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1426426.6666666667, ans=0.125 2023-12-24 01:11:34,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1426493.3333333333, ans=0.125 2023-12-24 01:11:40,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-12-24 01:11:49,245 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 3.890e+01 4.019e+01 4.191e+01 4.680e+01, threshold=8.038e+01, percent-clipped=0.0 2023-12-24 01:11:54,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1426626.6666666667, ans=0.125 2023-12-24 01:11:58,747 INFO [train.py:886] (3/4) Epoch 45, batch 4300, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4947219.78 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:07,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1426693.3333333333, ans=0.1 2023-12-24 01:12:08,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1426693.3333333333, ans=0.125 2023-12-24 01:12:19,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-12-24 01:12:22,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1426826.6666666667, ans=0.125 2023-12-24 01:12:27,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1426826.6666666667, ans=15.0 2023-12-24 01:12:28,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1426893.3333333333, ans=0.05 2023-12-24 01:12:30,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1426893.3333333333, ans=0.0 2023-12-24 01:12:50,102 INFO [train.py:886] (3/4) Epoch 45, batch 4350, loss[loss=0.005843, audio_tagging_loss=0.005843, over 24063.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4948395.01 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:12:50,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1427026.6666666667, ans=0.1 2023-12-24 01:12:59,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1427093.3333333333, ans=0.125 2023-12-24 01:12:59,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1427093.3333333333, ans=0.0 2023-12-24 01:13:19,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.26 vs. limit=10.0 2023-12-24 01:13:28,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1427226.6666666667, ans=10.0 2023-12-24 01:13:31,697 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.980e+01 4.130e+01 4.328e+01 5.553e+01, threshold=8.260e+01, percent-clipped=0.0 2023-12-24 01:13:36,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1427293.3333333333, ans=0.2 2023-12-24 01:13:43,000 INFO [train.py:886] (3/4) Epoch 45, batch 4400, loss[loss=0.01036, audio_tagging_loss=0.01036, over 21844.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4946751.43 frames. ], batch size: 107, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:13:57,335 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:14:00,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=15.0 2023-12-24 01:14:20,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1427560.0, ans=0.05 2023-12-24 01:14:23,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1427626.6666666667, ans=0.0 2023-12-24 01:14:24,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1427626.6666666667, ans=0.2 2023-12-24 01:14:32,980 INFO [train.py:886] (3/4) Epoch 45, batch 4450, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24750.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4944035.65 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:14:33,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1427693.3333333333, ans=0.0 2023-12-24 01:14:47,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1427760.0, ans=0.125 2023-12-24 01:14:50,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-12-24 01:14:53,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1427826.6666666667, ans=0.125 2023-12-24 01:14:54,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1427826.6666666667, ans=0.125 2023-12-24 01:15:15,502 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.412e+01 3.945e+01 4.084e+01 4.273e+01 5.400e+01, threshold=8.168e+01, percent-clipped=0.0 2023-12-24 01:15:15,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1427960.0, ans=0.0 2023-12-24 01:15:24,894 INFO [train.py:886] (3/4) Epoch 45, batch 4500, loss[loss=0.01062, audio_tagging_loss=0.01062, over 25000.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4946033.13 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:15:25,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1428026.6666666667, ans=15.0 2023-12-24 01:15:36,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1428093.3333333333, ans=0.09899494936611666 2023-12-24 01:15:53,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1428160.0, ans=0.0 2023-12-24 01:16:17,045 INFO [train.py:886] (3/4) Epoch 45, batch 4550, loss[loss=0.01266, audio_tagging_loss=0.01266, over 25000.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4942337.74 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:16:26,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1428426.6666666667, ans=0.125 2023-12-24 01:16:26,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1428426.6666666667, ans=0.1 2023-12-24 01:16:28,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-12-24 01:16:29,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1428426.6666666667, ans=0.125 2023-12-24 01:16:36,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1428493.3333333333, ans=0.2 2023-12-24 01:16:40,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1428493.3333333333, ans=0.04949747468305833 2023-12-24 01:16:51,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-12-24 01:16:59,853 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.365e+01 3.907e+01 4.021e+01 4.185e+01 4.612e+01, threshold=8.042e+01, percent-clipped=0.0 2023-12-24 01:17:08,559 INFO [train.py:886] (3/4) Epoch 45, batch 4600, loss[loss=0.01368, audio_tagging_loss=0.01368, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4943846.80 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:17:37,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1428826.6666666667, ans=0.5 2023-12-24 01:17:37,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1428826.6666666667, ans=0.0 2023-12-24 01:17:58,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1428960.0, ans=0.125 2023-12-24 01:18:00,685 INFO [train.py:886] (3/4) Epoch 45, batch 4650, loss[loss=0.01248, audio_tagging_loss=0.01248, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4943772.59 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:18:18,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-12-24 01:18:28,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-12-24 01:18:30,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1429226.6666666667, ans=0.0 2023-12-24 01:18:35,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1429226.6666666667, ans=0.2 2023-12-24 01:18:36,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-12-24 01:18:42,118 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.928e+01 4.124e+01 4.353e+01 4.841e+01, threshold=8.247e+01, percent-clipped=0.0 2023-12-24 01:18:50,469 INFO [train.py:886] (3/4) Epoch 45, batch 4700, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4945800.03 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:18:57,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1429360.0, ans=0.125 2023-12-24 01:18:59,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1429426.6666666667, ans=0.0 2023-12-24 01:19:37,540 INFO [train.py:886] (3/4) Epoch 45, batch 4750, loss[loss=0.01039, audio_tagging_loss=0.01039, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4947431.73 frames. ], batch size: 99, lr: 2.37e-03, grad_scale: 32.0 2023-12-24 01:19:42,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1429693.3333333333, ans=0.125 2023-12-24 01:20:09,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1429800.0, ans=0.0 2023-12-24 01:20:09,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-12-24 01:20:13,351 INFO [train.py:886] (3/4) Epoch 46, batch 0, loss[loss=0.02462, audio_tagging_loss=0.02462, over 25000.00 frames. ], tot_loss[loss=0.02462, audio_tagging_loss=0.02462, over 25000.00 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:20:13,352 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 01:20:24,959 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1187, 3.4807, 3.9512, 3.9265], device='cuda:3') 2023-12-24 01:20:34,590 INFO [train.py:917] (3/4) Epoch 46, validation: loss=0.03601, audio_tagging_loss=0.03601, over 3737520.00 frames. 2023-12-24 01:20:34,591 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 01:20:40,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-12-24 01:20:43,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1429866.6666666667, ans=10.0 2023-12-24 01:20:44,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1429866.6666666667, ans=0.1 2023-12-24 01:20:46,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1429866.6666666667, ans=0.0 2023-12-24 01:20:46,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1429866.6666666667, ans=0.2 2023-12-24 01:21:00,402 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.422e+01 4.025e+01 4.232e+01 5.097e+01 1.112e+02, threshold=8.463e+01, percent-clipped=5.0 2023-12-24 01:21:02,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1429933.3333333333, ans=0.0 2023-12-24 01:21:06,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1430000.0, ans=0.125 2023-12-24 01:21:06,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.42 vs. limit=10.0 2023-12-24 01:21:24,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-12-24 01:21:25,403 INFO [train.py:886] (3/4) Epoch 46, batch 50, loss[loss=0.01361, audio_tagging_loss=0.01361, over 25000.00 frames. ], tot_loss[loss=0.01728, audio_tagging_loss=0.01728, over 1114758.71 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:21:57,649 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:22:10,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1430400.0, ans=0.07 2023-12-24 01:22:12,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1430400.0, ans=0.125 2023-12-24 01:22:17,811 INFO [train.py:886] (3/4) Epoch 46, batch 100, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01498, audio_tagging_loss=0.01498, over 1963636.12 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:22:28,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1430533.3333333333, ans=0.125 2023-12-24 01:22:43,339 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.772e+01 4.262e+01 4.601e+01 4.856e+01 5.800e+01, threshold=9.203e+01, percent-clipped=0.0 2023-12-24 01:22:44,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1430600.0, ans=0.09899494936611666 2023-12-24 01:23:00,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-12-24 01:23:09,994 INFO [train.py:886] (3/4) Epoch 46, batch 150, loss[loss=0.01388, audio_tagging_loss=0.01388, over 23995.00 frames. ], tot_loss[loss=0.01379, audio_tagging_loss=0.01379, over 2626315.51 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:23:19,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1430866.6666666667, ans=0.0 2023-12-24 01:23:43,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-12-24 01:23:54,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1431066.6666666667, ans=0.0 2023-12-24 01:24:01,195 INFO [train.py:886] (3/4) Epoch 46, batch 200, loss[loss=0.0114, audio_tagging_loss=0.0114, over 25000.00 frames. ], tot_loss[loss=0.01307, audio_tagging_loss=0.01307, over 3150486.32 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:03,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1431133.3333333333, ans=0.1 2023-12-24 01:24:09,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1431133.3333333333, ans=0.125 2023-12-24 01:24:12,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1431200.0, ans=0.0 2023-12-24 01:24:20,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-12-24 01:24:24,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1431266.6666666667, ans=0.1 2023-12-24 01:24:26,295 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.683e+01 3.918e+01 4.124e+01 4.291e+01 5.491e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 01:24:28,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1431266.6666666667, ans=0.2 2023-12-24 01:24:49,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-12-24 01:24:49,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-12-24 01:24:50,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1431400.0, ans=0.125 2023-12-24 01:24:51,864 INFO [train.py:886] (3/4) Epoch 46, batch 250, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01253, audio_tagging_loss=0.01253, over 3554846.70 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:24:53,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1431466.6666666667, ans=0.0 2023-12-24 01:25:02,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1431533.3333333333, ans=0.2 2023-12-24 01:25:21,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.94 vs. limit=10.0 2023-12-24 01:25:28,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2023-12-24 01:25:42,510 INFO [train.py:886] (3/4) Epoch 46, batch 300, loss[loss=0.009811, audio_tagging_loss=0.009811, over 21922.00 frames. ], tot_loss[loss=0.01221, audio_tagging_loss=0.01221, over 3860559.39 frames. ], batch size: 107, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:25:42,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1431800.0, ans=0.0 2023-12-24 01:25:50,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.31 vs. limit=10.0 2023-12-24 01:26:08,051 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.903e+01 4.066e+01 4.292e+01 4.827e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 01:26:13,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-12-24 01:26:25,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1432066.6666666667, ans=0.0 2023-12-24 01:26:30,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1432066.6666666667, ans=0.0 2023-12-24 01:26:30,859 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:26:33,589 INFO [train.py:886] (3/4) Epoch 46, batch 350, loss[loss=0.01049, audio_tagging_loss=0.01049, over 24750.00 frames. ], tot_loss[loss=0.01195, audio_tagging_loss=0.01195, over 4095796.61 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:26:39,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1432133.3333333333, ans=0.1 2023-12-24 01:27:02,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1432266.6666666667, ans=0.0 2023-12-24 01:27:12,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1432333.3333333333, ans=0.0 2023-12-24 01:27:26,605 INFO [train.py:886] (3/4) Epoch 46, batch 400, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.0117, audio_tagging_loss=0.0117, over 4279145.97 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:27:29,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1432466.6666666667, ans=0.0 2023-12-24 01:27:52,247 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.333e+01 3.856e+01 4.044e+01 4.244e+01 4.925e+01, threshold=8.088e+01, percent-clipped=0.0 2023-12-24 01:27:53,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1432600.0, ans=0.0 2023-12-24 01:28:00,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1432666.6666666667, ans=0.2 2023-12-24 01:28:17,150 INFO [train.py:886] (3/4) Epoch 46, batch 450, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01147, audio_tagging_loss=0.01147, over 4429728.91 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:28:17,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-12-24 01:28:22,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1432800.0, ans=0.0 2023-12-24 01:28:22,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1432800.0, ans=0.125 2023-12-24 01:28:43,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1432933.3333333333, ans=0.5 2023-12-24 01:28:56,027 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:29:02,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-12-24 01:29:09,444 INFO [train.py:886] (3/4) Epoch 46, batch 500, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.01132, audio_tagging_loss=0.01132, over 4551029.91 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:29:14,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-12-24 01:29:22,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-12-24 01:29:35,996 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.577e+01 3.884e+01 4.049e+01 4.174e+01 4.739e+01, threshold=8.098e+01, percent-clipped=0.0 2023-12-24 01:30:01,321 INFO [train.py:886] (3/4) Epoch 46, batch 550, loss[loss=0.008273, audio_tagging_loss=0.008273, over 23989.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4640417.34 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:05,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=12.0 2023-12-24 01:30:19,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1433533.3333333333, ans=0.125 2023-12-24 01:30:28,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-12-24 01:30:34,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1433666.6666666667, ans=0.125 2023-12-24 01:30:35,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1433666.6666666667, ans=0.0 2023-12-24 01:30:38,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1433666.6666666667, ans=0.0 2023-12-24 01:30:44,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-12-24 01:30:52,353 INFO [train.py:886] (3/4) Epoch 46, batch 600, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01126, audio_tagging_loss=0.01126, over 4708961.12 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:30:54,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1433800.0, ans=0.125 2023-12-24 01:30:55,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1433800.0, ans=0.0 2023-12-24 01:31:08,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1433866.6666666667, ans=0.125 2023-12-24 01:31:11,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1433866.6666666667, ans=0.1 2023-12-24 01:31:15,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1433933.3333333333, ans=0.1 2023-12-24 01:31:18,052 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.528e+01 3.934e+01 4.117e+01 4.300e+01 4.914e+01, threshold=8.233e+01, percent-clipped=0.0 2023-12-24 01:31:26,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1434000.0, ans=0.5 2023-12-24 01:31:27,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1434000.0, ans=0.125 2023-12-24 01:31:29,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1434000.0, ans=0.125 2023-12-24 01:31:36,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:40,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1434066.6666666667, ans=0.125 2023-12-24 01:31:43,851 INFO [train.py:886] (3/4) Epoch 46, batch 650, loss[loss=0.01018, audio_tagging_loss=0.01018, over 24750.00 frames. ], tot_loss[loss=0.01134, audio_tagging_loss=0.01134, over 4762806.90 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:31:43,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1434133.3333333333, ans=0.0 2023-12-24 01:31:48,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1434133.3333333333, ans=0.125 2023-12-24 01:32:02,094 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:32:10,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1434266.6666666667, ans=0.5 2023-12-24 01:32:13,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1434333.3333333333, ans=0.125 2023-12-24 01:32:16,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1434333.3333333333, ans=0.125 2023-12-24 01:32:29,187 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:32:33,564 INFO [train.py:886] (3/4) Epoch 46, batch 700, loss[loss=0.01052, audio_tagging_loss=0.01052, over 24066.00 frames. ], tot_loss[loss=0.0113, audio_tagging_loss=0.0113, over 4802342.88 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:32:37,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-12-24 01:32:47,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1434533.3333333333, ans=0.0 2023-12-24 01:32:58,777 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.569e+01 3.947e+01 4.093e+01 4.311e+01 5.149e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 01:33:02,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1434666.6666666667, ans=0.125 2023-12-24 01:33:03,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1434666.6666666667, ans=0.2 2023-12-24 01:33:25,064 INFO [train.py:886] (3/4) Epoch 46, batch 750, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.0112, audio_tagging_loss=0.0112, over 4836771.94 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:33:26,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434800.0, ans=0.1 2023-12-24 01:33:51,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=12.0 2023-12-24 01:33:53,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-12-24 01:34:10,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1435066.6666666667, ans=0.125 2023-12-24 01:34:16,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1435133.3333333333, ans=0.2 2023-12-24 01:34:16,996 INFO [train.py:886] (3/4) Epoch 46, batch 800, loss[loss=0.01205, audio_tagging_loss=0.01205, over 25000.00 frames. ], tot_loss[loss=0.01118, audio_tagging_loss=0.01118, over 4863511.18 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:34:18,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1435133.3333333333, ans=0.125 2023-12-24 01:34:26,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1435200.0, ans=0.125 2023-12-24 01:34:28,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1435200.0, ans=0.1 2023-12-24 01:34:31,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1435200.0, ans=0.0 2023-12-24 01:34:39,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-12-24 01:34:41,861 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.539e+01 3.875e+01 4.046e+01 4.240e+01 5.244e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 01:34:44,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1435266.6666666667, ans=0.07 2023-12-24 01:34:44,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1435266.6666666667, ans=0.0 2023-12-24 01:34:45,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=1435266.6666666667, ans=22.5 2023-12-24 01:35:07,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-12-24 01:35:08,500 INFO [train.py:886] (3/4) Epoch 46, batch 850, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01114, audio_tagging_loss=0.01114, over 4881045.27 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:35:08,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1435466.6666666667, ans=0.0 2023-12-24 01:35:09,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1435466.6666666667, ans=0.125 2023-12-24 01:35:14,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1435466.6666666667, ans=0.1 2023-12-24 01:35:14,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1435466.6666666667, ans=0.125 2023-12-24 01:35:17,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1435533.3333333333, ans=0.2 2023-12-24 01:35:21,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1435533.3333333333, ans=0.1 2023-12-24 01:35:22,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1435533.3333333333, ans=0.0 2023-12-24 01:35:25,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1435533.3333333333, ans=0.2 2023-12-24 01:35:25,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1435533.3333333333, ans=0.125 2023-12-24 01:35:25,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.24 vs. limit=10.0 2023-12-24 01:35:31,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1435600.0, ans=0.1 2023-12-24 01:35:32,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1435600.0, ans=0.0 2023-12-24 01:35:41,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1435666.6666666667, ans=0.125 2023-12-24 01:35:49,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=12.0 2023-12-24 01:35:53,319 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:36:00,431 INFO [train.py:886] (3/4) Epoch 46, batch 900, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4898277.99 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:13,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1435866.6666666667, ans=0.0 2023-12-24 01:36:17,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1435866.6666666667, ans=0.125 2023-12-24 01:36:25,490 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.452e+01 3.869e+01 4.061e+01 4.225e+01 5.084e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 01:36:29,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1436000.0, ans=0.09899494936611666 2023-12-24 01:36:31,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1436000.0, ans=0.0 2023-12-24 01:36:37,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436000.0, ans=0.1 2023-12-24 01:36:45,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1436066.6666666667, ans=0.125 2023-12-24 01:36:50,209 INFO [train.py:886] (3/4) Epoch 46, batch 950, loss[loss=0.01099, audio_tagging_loss=0.01099, over 24750.00 frames. ], tot_loss[loss=0.01121, audio_tagging_loss=0.01121, over 4908818.46 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:36:59,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1436133.3333333333, ans=0.2 2023-12-24 01:37:42,098 INFO [train.py:886] (3/4) Epoch 46, batch 1000, loss[loss=0.01335, audio_tagging_loss=0.01335, over 25000.00 frames. ], tot_loss[loss=0.01119, audio_tagging_loss=0.01119, over 4917696.44 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:37:55,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1436533.3333333333, ans=0.125 2023-12-24 01:38:00,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1436533.3333333333, ans=0.125 2023-12-24 01:38:00,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1436533.3333333333, ans=0.2 2023-12-24 01:38:07,784 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.871e+01 4.031e+01 4.254e+01 4.824e+01, threshold=8.061e+01, percent-clipped=0.0 2023-12-24 01:38:10,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-12-24 01:38:12,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1436666.6666666667, ans=0.125 2023-12-24 01:38:13,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-12-24 01:38:19,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1436666.6666666667, ans=0.125 2023-12-24 01:38:25,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1436733.3333333333, ans=0.1 2023-12-24 01:38:32,884 INFO [train.py:886] (3/4) Epoch 46, batch 1050, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4923680.44 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:38:40,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1436800.0, ans=0.0 2023-12-24 01:38:55,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1436933.3333333333, ans=0.125 2023-12-24 01:38:58,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=1436933.3333333333, ans=22.5 2023-12-24 01:39:18,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1437066.6666666667, ans=0.5 2023-12-24 01:39:18,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-12-24 01:39:24,083 INFO [train.py:886] (3/4) Epoch 46, batch 1100, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4929932.96 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:39:30,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2023-12-24 01:39:32,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1437133.3333333333, ans=0.125 2023-12-24 01:39:33,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-12-24 01:39:37,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1437200.0, ans=0.0 2023-12-24 01:39:39,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1437200.0, ans=0.2 2023-12-24 01:39:48,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-12-24 01:39:49,745 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.427e+01 3.840e+01 4.057e+01 4.285e+01 6.085e+01, threshold=8.114e+01, percent-clipped=0.0 2023-12-24 01:40:08,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-12-24 01:40:15,278 INFO [train.py:886] (3/4) Epoch 46, batch 1150, loss[loss=0.00973, audio_tagging_loss=0.00973, over 25000.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4943106.93 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:40:18,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-12-24 01:40:31,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1437533.3333333333, ans=0.125 2023-12-24 01:40:46,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1437666.6666666667, ans=0.1 2023-12-24 01:41:05,274 INFO [train.py:886] (3/4) Epoch 46, batch 1200, loss[loss=0.01154, audio_tagging_loss=0.01154, over 25000.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4947903.46 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:41:25,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1437933.3333333333, ans=0.0 2023-12-24 01:41:26,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1437933.3333333333, ans=0.2 2023-12-24 01:41:30,967 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.478e+01 3.918e+01 4.092e+01 4.253e+01 4.725e+01, threshold=8.185e+01, percent-clipped=0.0 2023-12-24 01:41:38,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2023-12-24 01:41:41,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1438000.0, ans=0.0 2023-12-24 01:41:44,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1438000.0, ans=0.125 2023-12-24 01:41:45,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1438000.0, ans=0.0 2023-12-24 01:41:57,114 INFO [train.py:886] (3/4) Epoch 46, batch 1250, loss[loss=0.01105, audio_tagging_loss=0.01105, over 24750.00 frames. ], tot_loss[loss=0.0111, audio_tagging_loss=0.0111, over 4945128.34 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:42:20,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-12-24 01:42:47,088 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:42:47,840 INFO [train.py:886] (3/4) Epoch 46, batch 1300, loss[loss=0.01267, audio_tagging_loss=0.01267, over 24750.00 frames. ], tot_loss[loss=0.01106, audio_tagging_loss=0.01106, over 4942103.43 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:42:58,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1438533.3333333333, ans=0.025 2023-12-24 01:43:03,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1438533.3333333333, ans=0.2 2023-12-24 01:43:14,367 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.645e+01 3.930e+01 4.058e+01 4.275e+01 4.949e+01, threshold=8.116e+01, percent-clipped=0.0 2023-12-24 01:43:16,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1438600.0, ans=0.0 2023-12-24 01:43:24,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1438666.6666666667, ans=0.125 2023-12-24 01:43:37,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.65 vs. limit=10.0 2023-12-24 01:43:39,290 INFO [train.py:886] (3/4) Epoch 46, batch 1350, loss[loss=0.01351, audio_tagging_loss=0.01351, over 24928.00 frames. ], tot_loss[loss=0.01104, audio_tagging_loss=0.01104, over 4940213.91 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 32.0 2023-12-24 01:43:47,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1438800.0, ans=0.125 2023-12-24 01:43:47,248 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:43:48,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1438800.0, ans=0.09899494936611666 2023-12-24 01:43:58,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1438866.6666666667, ans=0.1 2023-12-24 01:44:00,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1438933.3333333333, ans=0.0 2023-12-24 01:44:12,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-12-24 01:44:31,801 INFO [train.py:886] (3/4) Epoch 46, batch 1400, loss[loss=0.01291, audio_tagging_loss=0.01291, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4942014.16 frames. ], batch size: 99, lr: 2.34e-03, grad_scale: 64.0 2023-12-24 01:44:34,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1439133.3333333333, ans=0.0 2023-12-24 01:44:53,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1439266.6666666667, ans=0.0 2023-12-24 01:44:58,226 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.346e+01 3.871e+01 4.064e+01 4.207e+01 5.055e+01, threshold=8.128e+01, percent-clipped=0.0 2023-12-24 01:45:24,843 INFO [train.py:886] (3/4) Epoch 46, batch 1450, loss[loss=0.01182, audio_tagging_loss=0.01182, over 25000.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4944059.18 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:45:33,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1439533.3333333333, ans=0.125 2023-12-24 01:45:52,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1439600.0, ans=0.125 2023-12-24 01:46:01,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1439666.6666666667, ans=0.125 2023-12-24 01:46:05,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1439733.3333333333, ans=0.1 2023-12-24 01:46:12,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1439733.3333333333, ans=0.125 2023-12-24 01:46:15,266 INFO [train.py:886] (3/4) Epoch 46, batch 1500, loss[loss=0.009562, audio_tagging_loss=0.009562, over 24750.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4948665.93 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:46:36,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2023-12-24 01:46:41,574 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.511e+01 3.911e+01 4.080e+01 4.273e+01 5.286e+01, threshold=8.160e+01, percent-clipped=0.0 2023-12-24 01:46:42,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1439933.3333333333, ans=0.0 2023-12-24 01:47:10,224 INFO [train.py:886] (3/4) Epoch 46, batch 1550, loss[loss=0.009577, audio_tagging_loss=0.009577, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4944163.87 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:02,291 INFO [train.py:886] (3/4) Epoch 46, batch 1600, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4939460.57 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:02,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1440466.6666666667, ans=0.125 2023-12-24 01:48:07,882 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:48:13,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1440533.3333333333, ans=0.0 2023-12-24 01:48:16,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1440533.3333333333, ans=0.125 2023-12-24 01:48:17,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=1440533.3333333333, ans=0.05 2023-12-24 01:48:26,622 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.459e+01 3.905e+01 4.113e+01 4.286e+01 4.788e+01, threshold=8.225e+01, percent-clipped=0.0 2023-12-24 01:48:35,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1440666.6666666667, ans=0.125 2023-12-24 01:48:52,885 INFO [train.py:886] (3/4) Epoch 46, batch 1650, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4934200.04 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:48:56,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1440800.0, ans=0.2 2023-12-24 01:49:15,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1440933.3333333333, ans=0.125 2023-12-24 01:49:24,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1441000.0, ans=0.0 2023-12-24 01:49:26,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1441000.0, ans=0.1 2023-12-24 01:49:39,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1441066.6666666667, ans=0.125 2023-12-24 01:49:46,069 INFO [train.py:886] (3/4) Epoch 46, batch 1700, loss[loss=0.01074, audio_tagging_loss=0.01074, over 24055.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4931968.51 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:49:47,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-12-24 01:49:58,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1441200.0, ans=0.125 2023-12-24 01:50:01,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1441200.0, ans=0.1 2023-12-24 01:50:02,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-12-24 01:50:04,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1441266.6666666667, ans=0.125 2023-12-24 01:50:11,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.496e+01 3.860e+01 4.005e+01 4.201e+01 5.398e+01, threshold=8.010e+01, percent-clipped=0.0 2023-12-24 01:50:13,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1441266.6666666667, ans=0.125 2023-12-24 01:50:20,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-12-24 01:50:29,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1441400.0, ans=0.2 2023-12-24 01:50:36,428 INFO [train.py:886] (3/4) Epoch 46, batch 1750, loss[loss=0.009315, audio_tagging_loss=0.009315, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4942321.01 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:50:54,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1441533.3333333333, ans=0.1 2023-12-24 01:51:14,853 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 01:51:14,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1441666.6666666667, ans=0.95 2023-12-24 01:51:15,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2023-12-24 01:51:25,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1441733.3333333333, ans=0.0 2023-12-24 01:51:29,191 INFO [train.py:886] (3/4) Epoch 46, batch 1800, loss[loss=0.009993, audio_tagging_loss=0.009993, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4943442.08 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:51:38,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-12-24 01:51:39,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1441866.6666666667, ans=0.125 2023-12-24 01:51:55,676 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.556e+01 3.867e+01 4.060e+01 4.230e+01 5.187e+01, threshold=8.121e+01, percent-clipped=0.0 2023-12-24 01:52:09,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1442066.6666666667, ans=0.0 2023-12-24 01:52:20,839 INFO [train.py:886] (3/4) Epoch 46, batch 1850, loss[loss=0.01111, audio_tagging_loss=0.01111, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4944555.68 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:52:23,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442133.3333333333, ans=0.1 2023-12-24 01:52:34,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1442200.0, ans=0.0 2023-12-24 01:52:41,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1442266.6666666667, ans=0.125 2023-12-24 01:52:42,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1442266.6666666667, ans=15.0 2023-12-24 01:52:54,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1442333.3333333333, ans=0.125 2023-12-24 01:53:04,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1442400.0, ans=0.125 2023-12-24 01:53:08,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1442400.0, ans=0.125 2023-12-24 01:53:12,140 INFO [train.py:886] (3/4) Epoch 46, batch 1900, loss[loss=0.0124, audio_tagging_loss=0.0124, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4947542.84 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:53:12,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1442466.6666666667, ans=0.125 2023-12-24 01:53:33,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1442600.0, ans=0.2 2023-12-24 01:53:37,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1442600.0, ans=0.2 2023-12-24 01:53:38,753 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.923e+01 4.090e+01 4.316e+01 4.935e+01, threshold=8.180e+01, percent-clipped=0.0 2023-12-24 01:53:46,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1442666.6666666667, ans=0.125 2023-12-24 01:54:05,335 INFO [train.py:886] (3/4) Epoch 46, batch 1950, loss[loss=0.01106, audio_tagging_loss=0.01106, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4944413.17 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:54:08,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1442800.0, ans=10.0 2023-12-24 01:54:37,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1443000.0, ans=10.0 2023-12-24 01:54:52,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1443066.6666666667, ans=0.0 2023-12-24 01:54:56,397 INFO [train.py:886] (3/4) Epoch 46, batch 2000, loss[loss=0.008829, audio_tagging_loss=0.008829, over 24028.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944693.12 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:55:02,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-12-24 01:55:09,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-12-24 01:55:21,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1443266.6666666667, ans=22.5 2023-12-24 01:55:22,168 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 3.855e+01 4.032e+01 4.223e+01 5.008e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 01:55:22,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1443266.6666666667, ans=0.1 2023-12-24 01:55:26,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1443333.3333333333, ans=15.0 2023-12-24 01:55:42,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-12-24 01:55:47,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1443400.0, ans=0.1 2023-12-24 01:55:48,849 INFO [train.py:886] (3/4) Epoch 46, batch 2050, loss[loss=0.008279, audio_tagging_loss=0.008279, over 22196.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4947412.91 frames. ], batch size: 107, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:55:49,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1443466.6666666667, ans=0.125 2023-12-24 01:55:50,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2023-12-24 01:55:56,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1443466.6666666667, ans=0.2 2023-12-24 01:56:27,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1443666.6666666667, ans=0.125 2023-12-24 01:56:29,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2023-12-24 01:56:41,053 INFO [train.py:886] (3/4) Epoch 46, batch 2100, loss[loss=0.009969, audio_tagging_loss=0.009969, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4950594.62 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:57:01,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-12-24 01:57:01,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1443933.3333333333, ans=0.2 2023-12-24 01:57:01,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1443933.3333333333, ans=0.1 2023-12-24 01:57:05,942 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.524e+01 3.905e+01 4.022e+01 4.224e+01 4.545e+01, threshold=8.045e+01, percent-clipped=0.0 2023-12-24 01:57:10,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2023-12-24 01:57:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444000.0, ans=0.1 2023-12-24 01:57:25,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1444066.6666666667, ans=0.1 2023-12-24 01:57:32,019 INFO [train.py:886] (3/4) Epoch 46, batch 2150, loss[loss=0.01179, audio_tagging_loss=0.01179, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4953771.78 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:57:41,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1444133.3333333333, ans=0.125 2023-12-24 01:57:47,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444200.0, ans=0.1 2023-12-24 01:57:47,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1444200.0, ans=0.125 2023-12-24 01:57:49,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1444200.0, ans=0.1 2023-12-24 01:58:14,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1444400.0, ans=0.0 2023-12-24 01:58:24,350 INFO [train.py:886] (3/4) Epoch 46, batch 2200, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4950953.06 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:58:32,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1444466.6666666667, ans=0.0 2023-12-24 01:58:37,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1444533.3333333333, ans=0.0 2023-12-24 01:58:50,836 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.513e+01 3.958e+01 4.112e+01 4.314e+01 5.314e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 01:58:54,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1444666.6666666667, ans=0.125 2023-12-24 01:59:16,820 INFO [train.py:886] (3/4) Epoch 46, batch 2250, loss[loss=0.009452, audio_tagging_loss=0.009452, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4942049.41 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 01:59:37,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1444933.3333333333, ans=0.125 2023-12-24 01:59:40,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2023-12-24 01:59:48,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1445000.0, ans=0.125 2023-12-24 01:59:53,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2023-12-24 02:00:00,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-12-24 02:00:08,438 INFO [train.py:886] (3/4) Epoch 46, batch 2300, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4943925.18 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:00:34,260 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.386e+01 3.894e+01 4.073e+01 4.228e+01 5.336e+01, threshold=8.145e+01, percent-clipped=0.0 2023-12-24 02:00:45,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1445333.3333333333, ans=0.125 2023-12-24 02:01:00,766 INFO [train.py:886] (3/4) Epoch 46, batch 2350, loss[loss=0.01065, audio_tagging_loss=0.01065, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4948989.22 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:01:04,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1445466.6666666667, ans=0.125 2023-12-24 02:01:08,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1445466.6666666667, ans=0.0 2023-12-24 02:01:28,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2023-12-24 02:01:38,155 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:01:39,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2023-12-24 02:01:45,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1445733.3333333333, ans=0.125 2023-12-24 02:01:49,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1445733.3333333333, ans=0.0 2023-12-24 02:01:49,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1445733.3333333333, ans=0.0 2023-12-24 02:01:51,092 INFO [train.py:886] (3/4) Epoch 46, batch 2400, loss[loss=0.01104, audio_tagging_loss=0.01104, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4947428.18 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:01:53,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1445800.0, ans=0.1 2023-12-24 02:02:00,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1445800.0, ans=0.1 2023-12-24 02:02:05,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-12-24 02:02:16,981 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.495e+01 3.926e+01 4.069e+01 4.266e+01 5.020e+01, threshold=8.138e+01, percent-clipped=0.0 2023-12-24 02:02:41,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1446066.6666666667, ans=0.1 2023-12-24 02:02:43,298 INFO [train.py:886] (3/4) Epoch 46, batch 2450, loss[loss=0.0133, audio_tagging_loss=0.0133, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4955349.00 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:02:49,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1446133.3333333333, ans=0.0 2023-12-24 02:02:49,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1446133.3333333333, ans=0.125 2023-12-24 02:02:56,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1446200.0, ans=0.125 2023-12-24 02:03:00,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1446200.0, ans=0.125 2023-12-24 02:03:35,347 INFO [train.py:886] (3/4) Epoch 46, batch 2500, loss[loss=0.0111, audio_tagging_loss=0.0111, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4949895.39 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:03:44,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1446533.3333333333, ans=0.04949747468305833 2023-12-24 02:03:45,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-12-24 02:03:45,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1446533.3333333333, ans=0.125 2023-12-24 02:03:48,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-12-24 02:03:51,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1446533.3333333333, ans=0.125 2023-12-24 02:04:00,417 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 3.970e+01 4.120e+01 4.239e+01 5.060e+01, threshold=8.241e+01, percent-clipped=0.0 2023-12-24 02:04:07,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2023-12-24 02:04:13,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1446666.6666666667, ans=0.125 2023-12-24 02:04:25,327 INFO [train.py:886] (3/4) Epoch 46, batch 2550, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944762.21 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:04:46,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1446933.3333333333, ans=0.0 2023-12-24 02:05:18,340 INFO [train.py:886] (3/4) Epoch 46, batch 2600, loss[loss=0.009925, audio_tagging_loss=0.009925, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4948352.08 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:05:20,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1447133.3333333333, ans=0.07 2023-12-24 02:05:27,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1447200.0, ans=0.2 2023-12-24 02:05:27,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1447200.0, ans=0.125 2023-12-24 02:05:35,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1447200.0, ans=0.125 2023-12-24 02:05:44,708 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.567e+01 3.903e+01 4.068e+01 4.253e+01 4.776e+01, threshold=8.137e+01, percent-clipped=0.0 2023-12-24 02:05:59,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1447400.0, ans=0.1 2023-12-24 02:06:02,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1447400.0, ans=0.0 2023-12-24 02:06:04,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1447400.0, ans=0.1 2023-12-24 02:06:09,959 INFO [train.py:886] (3/4) Epoch 46, batch 2650, loss[loss=0.01161, audio_tagging_loss=0.01161, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4947451.92 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:06:10,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1447466.6666666667, ans=0.125 2023-12-24 02:06:13,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1447466.6666666667, ans=0.125 2023-12-24 02:06:16,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1447466.6666666667, ans=0.125 2023-12-24 02:06:16,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1447466.6666666667, ans=0.125 2023-12-24 02:06:17,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1447466.6666666667, ans=0.2 2023-12-24 02:06:17,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-12-24 02:06:27,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-12-24 02:06:31,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1447600.0, ans=0.125 2023-12-24 02:06:36,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1447600.0, ans=0.0 2023-12-24 02:06:50,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-12-24 02:06:52,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1447733.3333333333, ans=0.0 2023-12-24 02:07:01,527 INFO [train.py:886] (3/4) Epoch 46, batch 2700, loss[loss=0.01169, audio_tagging_loss=0.01169, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4952263.58 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:07:27,933 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.925e+01 4.049e+01 4.308e+01 4.721e+01, threshold=8.099e+01, percent-clipped=0.0 2023-12-24 02:07:43,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1448066.6666666667, ans=0.0 2023-12-24 02:07:53,858 INFO [train.py:886] (3/4) Epoch 46, batch 2750, loss[loss=0.01244, audio_tagging_loss=0.01244, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4955840.72 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:07,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1448200.0, ans=15.0 2023-12-24 02:08:09,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=22.5 2023-12-24 02:08:14,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1448266.6666666667, ans=0.0 2023-12-24 02:08:31,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1448333.3333333333, ans=0.125 2023-12-24 02:08:36,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1448400.0, ans=0.0 2023-12-24 02:08:43,409 INFO [train.py:886] (3/4) Epoch 46, batch 2800, loss[loss=0.01061, audio_tagging_loss=0.01061, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4952136.71 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:08:50,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1448466.6666666667, ans=0.0 2023-12-24 02:08:57,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1448533.3333333333, ans=0.0 2023-12-24 02:09:09,791 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.906e+01 4.083e+01 4.345e+01 5.056e+01, threshold=8.167e+01, percent-clipped=0.0 2023-12-24 02:09:20,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1448666.6666666667, ans=0.09899494936611666 2023-12-24 02:09:20,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1448666.6666666667, ans=0.0 2023-12-24 02:09:28,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1448733.3333333333, ans=0.125 2023-12-24 02:09:36,212 INFO [train.py:886] (3/4) Epoch 46, batch 2850, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4948238.96 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:10:01,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1448933.3333333333, ans=0.125 2023-12-24 02:10:01,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1448933.3333333333, ans=0.125 2023-12-24 02:10:22,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1449066.6666666667, ans=0.1 2023-12-24 02:10:22,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1449066.6666666667, ans=0.0 2023-12-24 02:10:27,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1449133.3333333333, ans=0.0 2023-12-24 02:10:28,352 INFO [train.py:886] (3/4) Epoch 46, batch 2900, loss[loss=0.01215, audio_tagging_loss=0.01215, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4940646.65 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:10:37,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1449200.0, ans=0.0 2023-12-24 02:10:46,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1449200.0, ans=0.0 2023-12-24 02:10:53,594 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.454e+01 3.893e+01 4.087e+01 4.310e+01 5.363e+01, threshold=8.174e+01, percent-clipped=0.0 2023-12-24 02:11:09,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1449400.0, ans=0.0 2023-12-24 02:11:19,440 INFO [train.py:886] (3/4) Epoch 46, batch 2950, loss[loss=0.009658, audio_tagging_loss=0.009658, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4944950.49 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:11:49,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1449600.0, ans=0.2 2023-12-24 02:12:00,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-12-24 02:12:05,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1449733.3333333333, ans=0.125 2023-12-24 02:12:12,483 INFO [train.py:886] (3/4) Epoch 46, batch 3000, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4946416.15 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:12:12,483 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 02:12:22,249 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3252, 3.5037, 4.2134, 3.9174], device='cuda:3') 2023-12-24 02:12:33,159 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6898, 2.8726, 3.7479, 3.7703], device='cuda:3') 2023-12-24 02:12:34,118 INFO [train.py:917] (3/4) Epoch 46, validation: loss=0.03679, audio_tagging_loss=0.03679, over 3737520.00 frames. 2023-12-24 02:12:34,119 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 02:12:36,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1449800.0, ans=0.125 2023-12-24 02:12:58,527 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.409e+01 3.890e+01 4.114e+01 4.303e+01 5.269e+01, threshold=8.229e+01, percent-clipped=0.0 2023-12-24 02:12:58,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1449933.3333333333, ans=0.125 2023-12-24 02:13:06,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=12.0 2023-12-24 02:13:14,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1450000.0, ans=0.1 2023-12-24 02:13:17,746 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:13:22,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1450066.6666666667, ans=0.95 2023-12-24 02:13:24,998 INFO [train.py:886] (3/4) Epoch 46, batch 3050, loss[loss=0.01174, audio_tagging_loss=0.01174, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4950770.44 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:13:28,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1450133.3333333333, ans=0.125 2023-12-24 02:13:29,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1450133.3333333333, ans=0.2 2023-12-24 02:13:30,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1450133.3333333333, ans=0.2 2023-12-24 02:13:36,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.04 vs. limit=12.0 2023-12-24 02:13:40,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2023-12-24 02:13:41,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1450200.0, ans=0.125 2023-12-24 02:13:55,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2023-12-24 02:13:56,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1450333.3333333333, ans=0.05 2023-12-24 02:14:00,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.43 vs. limit=5.0 2023-12-24 02:14:16,873 INFO [train.py:886] (3/4) Epoch 46, batch 3100, loss[loss=0.01143, audio_tagging_loss=0.01143, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4954336.77 frames. ], batch size: 99, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:14:18,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1450466.6666666667, ans=0.125 2023-12-24 02:14:24,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1450466.6666666667, ans=0.125 2023-12-24 02:14:26,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1450533.3333333333, ans=0.07 2023-12-24 02:14:27,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1450533.3333333333, ans=0.04949747468305833 2023-12-24 02:14:39,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1450600.0, ans=0.1 2023-12-24 02:14:41,854 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.614e+01 3.922e+01 4.127e+01 4.313e+01 5.087e+01, threshold=8.254e+01, percent-clipped=0.0 2023-12-24 02:15:07,068 INFO [train.py:886] (3/4) Epoch 46, batch 3150, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4949638.25 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:15:33,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1450933.3333333333, ans=0.1 2023-12-24 02:15:44,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1451000.0, ans=0.125 2023-12-24 02:15:45,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451000.0, ans=0.1 2023-12-24 02:15:48,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451066.6666666667, ans=0.1 2023-12-24 02:15:58,423 INFO [train.py:886] (3/4) Epoch 46, batch 3200, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4946201.52 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:16:20,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1451266.6666666667, ans=0.0 2023-12-24 02:16:24,219 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.595e+01 3.931e+01 4.109e+01 4.308e+01 5.073e+01, threshold=8.218e+01, percent-clipped=0.0 2023-12-24 02:16:26,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1451266.6666666667, ans=0.0 2023-12-24 02:16:35,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1451333.3333333333, ans=0.125 2023-12-24 02:16:50,754 INFO [train.py:886] (3/4) Epoch 46, batch 3250, loss[loss=0.009079, audio_tagging_loss=0.009079, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4948237.42 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 64.0 2023-12-24 02:17:01,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451533.3333333333, ans=0.1 2023-12-24 02:17:16,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=12.0 2023-12-24 02:17:26,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1451666.6666666667, ans=0.025 2023-12-24 02:17:28,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1451666.6666666667, ans=0.0 2023-12-24 02:17:33,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1451733.3333333333, ans=0.0 2023-12-24 02:17:35,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2023-12-24 02:17:39,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1451733.3333333333, ans=0.1 2023-12-24 02:17:41,152 INFO [train.py:886] (3/4) Epoch 46, batch 3300, loss[loss=0.008307, audio_tagging_loss=0.008307, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4954498.37 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:17:52,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-12-24 02:18:07,489 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.526e+01 3.879e+01 4.032e+01 4.165e+01 5.063e+01, threshold=8.064e+01, percent-clipped=0.0 2023-12-24 02:18:07,669 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:18:08,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1451933.3333333333, ans=0.125 2023-12-24 02:18:09,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1451933.3333333333, ans=0.125 2023-12-24 02:18:11,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1452000.0, ans=0.125 2023-12-24 02:18:21,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1452000.0, ans=0.1 2023-12-24 02:18:23,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1452066.6666666667, ans=0.125 2023-12-24 02:18:25,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2023-12-24 02:18:26,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1452066.6666666667, ans=0.2 2023-12-24 02:18:33,748 INFO [train.py:886] (3/4) Epoch 46, batch 3350, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4955564.71 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:18:58,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-12-24 02:19:03,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1452266.6666666667, ans=0.0 2023-12-24 02:19:12,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1452333.3333333333, ans=0.2 2023-12-24 02:19:18,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1452400.0, ans=0.125 2023-12-24 02:19:25,264 INFO [train.py:886] (3/4) Epoch 46, batch 3400, loss[loss=0.009655, audio_tagging_loss=0.009655, over 25000.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4954897.31 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:19:27,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-12-24 02:19:34,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-24 02:19:37,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1452533.3333333333, ans=0.125 2023-12-24 02:19:41,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1452533.3333333333, ans=0.125 2023-12-24 02:19:48,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1452600.0, ans=0.125 2023-12-24 02:19:51,852 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 3.965e+01 4.111e+01 4.276e+01 8.253e+01, threshold=8.223e+01, percent-clipped=1.0 2023-12-24 02:19:52,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2023-12-24 02:19:56,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1452666.6666666667, ans=0.125 2023-12-24 02:20:17,506 INFO [train.py:886] (3/4) Epoch 46, batch 3450, loss[loss=0.01151, audio_tagging_loss=0.01151, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4952145.08 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:20:23,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2023-12-24 02:20:41,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1452933.3333333333, ans=0.0 2023-12-24 02:20:46,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1452933.3333333333, ans=0.0 2023-12-24 02:20:48,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1453000.0, ans=0.0 2023-12-24 02:20:51,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1453000.0, ans=0.0 2023-12-24 02:20:59,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1453066.6666666667, ans=0.125 2023-12-24 02:20:59,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453066.6666666667, ans=0.1 2023-12-24 02:21:09,813 INFO [train.py:886] (3/4) Epoch 46, batch 3500, loss[loss=0.01327, audio_tagging_loss=0.01327, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4946939.61 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:21:11,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1453133.3333333333, ans=0.0 2023-12-24 02:21:37,136 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.445e+01 3.883e+01 4.040e+01 4.247e+01 5.009e+01, threshold=8.081e+01, percent-clipped=0.0 2023-12-24 02:21:41,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:21:43,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453333.3333333333, ans=0.1 2023-12-24 02:21:48,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1453333.3333333333, ans=0.125 2023-12-24 02:21:48,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1453333.3333333333, ans=0.0 2023-12-24 02:22:01,500 INFO [train.py:886] (3/4) Epoch 46, batch 3550, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4944427.04 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:03,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1453466.6666666667, ans=0.0 2023-12-24 02:22:17,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-12-24 02:22:36,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-12-24 02:22:36,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1453666.6666666667, ans=0.0 2023-12-24 02:22:43,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1453733.3333333333, ans=0.09899494936611666 2023-12-24 02:22:49,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1453733.3333333333, ans=0.125 2023-12-24 02:22:53,324 INFO [train.py:886] (3/4) Epoch 46, batch 3600, loss[loss=0.01085, audio_tagging_loss=0.01085, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4948149.86 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:22:59,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1453800.0, ans=0.125 2023-12-24 02:23:14,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-12-24 02:23:20,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.432e+01 3.928e+01 4.098e+01 4.249e+01 6.702e+01, threshold=8.195e+01, percent-clipped=0.0 2023-12-24 02:23:21,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1453933.3333333333, ans=0.125 2023-12-24 02:23:46,088 INFO [train.py:886] (3/4) Epoch 46, batch 3650, loss[loss=0.0121, audio_tagging_loss=0.0121, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4945354.43 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:23:49,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1454133.3333333333, ans=0.0 2023-12-24 02:23:59,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1454200.0, ans=0.125 2023-12-24 02:24:01,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1454200.0, ans=0.1 2023-12-24 02:24:23,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1454333.3333333333, ans=0.0 2023-12-24 02:24:31,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1454400.0, ans=0.1 2023-12-24 02:24:31,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1454400.0, ans=0.09899494936611666 2023-12-24 02:24:36,133 INFO [train.py:886] (3/4) Epoch 46, batch 3700, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4949425.47 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:24:37,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1454466.6666666667, ans=0.0 2023-12-24 02:24:38,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1454466.6666666667, ans=0.0 2023-12-24 02:24:56,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1454533.3333333333, ans=0.0 2023-12-24 02:25:03,711 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.918e+01 4.062e+01 4.194e+01 4.815e+01, threshold=8.124e+01, percent-clipped=0.0 2023-12-24 02:25:15,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1454666.6666666667, ans=0.05 2023-12-24 02:25:18,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1454733.3333333333, ans=0.125 2023-12-24 02:25:29,650 INFO [train.py:886] (3/4) Epoch 46, batch 3750, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4946812.40 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:25:29,845 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:25:38,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=22.5 2023-12-24 02:25:40,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-24 02:25:50,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1454933.3333333333, ans=0.125 2023-12-24 02:25:58,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1454933.3333333333, ans=0.125 2023-12-24 02:25:58,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1454933.3333333333, ans=0.0 2023-12-24 02:26:07,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1455000.0, ans=0.125 2023-12-24 02:26:10,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1455066.6666666667, ans=0.125 2023-12-24 02:26:11,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1455066.6666666667, ans=0.2 2023-12-24 02:26:20,503 INFO [train.py:886] (3/4) Epoch 46, batch 3800, loss[loss=0.01156, audio_tagging_loss=0.01156, over 24750.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4941247.17 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:26:26,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1455133.3333333333, ans=0.125 2023-12-24 02:26:46,647 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.525e+01 3.960e+01 4.079e+01 4.273e+01 4.996e+01, threshold=8.158e+01, percent-clipped=0.0 2023-12-24 02:26:51,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1455333.3333333333, ans=0.2 2023-12-24 02:27:11,912 INFO [train.py:886] (3/4) Epoch 46, batch 3850, loss[loss=0.01058, audio_tagging_loss=0.01058, over 22653.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4940370.74 frames. ], batch size: 107, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:27:14,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1455466.6666666667, ans=0.125 2023-12-24 02:27:16,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1455466.6666666667, ans=0.1 2023-12-24 02:27:18,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1455466.6666666667, ans=0.125 2023-12-24 02:27:28,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1455533.3333333333, ans=0.125 2023-12-24 02:27:42,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.33 vs. limit=5.0 2023-12-24 02:27:43,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-12-24 02:28:00,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1455733.3333333333, ans=0.0 2023-12-24 02:28:03,996 INFO [train.py:886] (3/4) Epoch 46, batch 3900, loss[loss=0.009827, audio_tagging_loss=0.009827, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4942678.82 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:28:07,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-12-24 02:28:12,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1455866.6666666667, ans=0.125 2023-12-24 02:28:29,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1455933.3333333333, ans=0.125 2023-12-24 02:28:29,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1455933.3333333333, ans=0.125 2023-12-24 02:28:30,030 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 3.894e+01 4.050e+01 4.339e+01 5.039e+01, threshold=8.100e+01, percent-clipped=0.0 2023-12-24 02:28:43,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1456000.0, ans=0.1 2023-12-24 02:28:44,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1456066.6666666667, ans=0.125 2023-12-24 02:28:46,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1456066.6666666667, ans=0.1 2023-12-24 02:28:54,395 INFO [train.py:886] (3/4) Epoch 46, batch 3950, loss[loss=0.01255, audio_tagging_loss=0.01255, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4948705.90 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:29:05,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1456200.0, ans=0.0 2023-12-24 02:29:08,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1456200.0, ans=0.125 2023-12-24 02:29:21,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1456266.6666666667, ans=0.5 2023-12-24 02:29:22,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1456266.6666666667, ans=10.0 2023-12-24 02:29:22,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1456266.6666666667, ans=0.2 2023-12-24 02:29:30,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1456333.3333333333, ans=0.0 2023-12-24 02:29:43,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2023-12-24 02:29:46,384 INFO [train.py:886] (3/4) Epoch 46, batch 4000, loss[loss=0.01326, audio_tagging_loss=0.01326, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4954972.05 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:30:13,630 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.650e+01 3.977e+01 4.098e+01 4.271e+01 5.184e+01, threshold=8.196e+01, percent-clipped=0.0 2023-12-24 02:30:19,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2023-12-24 02:30:26,976 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:30:37,712 INFO [train.py:886] (3/4) Epoch 46, batch 4050, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4953755.21 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:30:39,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2023-12-24 02:31:09,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1457000.0, ans=0.125 2023-12-24 02:31:12,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1457000.0, ans=0.125 2023-12-24 02:31:24,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1457066.6666666667, ans=0.125 2023-12-24 02:31:27,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1457133.3333333333, ans=0.1 2023-12-24 02:31:28,344 INFO [train.py:886] (3/4) Epoch 46, batch 4100, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4947743.30 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:31:30,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1457133.3333333333, ans=0.1 2023-12-24 02:31:51,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-12-24 02:31:55,139 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.971e+01 4.093e+01 4.225e+01 5.395e+01, threshold=8.186e+01, percent-clipped=0.0 2023-12-24 02:32:04,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1457333.3333333333, ans=0.125 2023-12-24 02:32:20,997 INFO [train.py:886] (3/4) Epoch 46, batch 4150, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4948675.85 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:32:46,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1457600.0, ans=0.5 2023-12-24 02:32:57,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1457666.6666666667, ans=0.2 2023-12-24 02:32:58,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1457666.6666666667, ans=0.125 2023-12-24 02:32:59,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2023-12-24 02:33:00,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1457733.3333333333, ans=0.07 2023-12-24 02:33:10,864 INFO [train.py:886] (3/4) Epoch 46, batch 4200, loss[loss=0.009875, audio_tagging_loss=0.009875, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4949993.72 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:33:28,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2023-12-24 02:33:28,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1457866.6666666667, ans=0.125 2023-12-24 02:33:38,197 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.408e+01 3.862e+01 4.047e+01 4.185e+01 5.649e+01, threshold=8.095e+01, percent-clipped=0.0 2023-12-24 02:33:41,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1458000.0, ans=0.2 2023-12-24 02:34:03,961 INFO [train.py:886] (3/4) Epoch 46, batch 4250, loss[loss=0.01072, audio_tagging_loss=0.01072, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4955246.74 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:34:05,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2023-12-24 02:34:11,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1458133.3333333333, ans=0.0 2023-12-24 02:34:15,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1458200.0, ans=0.0 2023-12-24 02:34:19,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1458200.0, ans=0.1 2023-12-24 02:34:24,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1458266.6666666667, ans=0.125 2023-12-24 02:34:36,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-12-24 02:34:36,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1458333.3333333333, ans=0.0 2023-12-24 02:34:42,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1458333.3333333333, ans=0.125 2023-12-24 02:34:43,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-12-24 02:34:45,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1458400.0, ans=0.0 2023-12-24 02:34:46,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1458400.0, ans=0.1 2023-12-24 02:34:55,811 INFO [train.py:886] (3/4) Epoch 46, batch 4300, loss[loss=0.01399, audio_tagging_loss=0.01399, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4957286.00 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:34:57,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1458466.6666666667, ans=0.0 2023-12-24 02:34:58,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1458466.6666666667, ans=0.2 2023-12-24 02:35:06,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1458533.3333333333, ans=0.0 2023-12-24 02:35:12,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1458533.3333333333, ans=0.1 2023-12-24 02:35:21,232 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.535e+01 3.901e+01 4.132e+01 4.342e+01 5.346e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 02:35:34,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1458666.6666666667, ans=0.0 2023-12-24 02:35:35,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1458666.6666666667, ans=0.0 2023-12-24 02:35:40,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1458733.3333333333, ans=0.04949747468305833 2023-12-24 02:35:46,805 INFO [train.py:886] (3/4) Epoch 46, batch 4350, loss[loss=0.01184, audio_tagging_loss=0.01184, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4961097.45 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:36:16,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1458933.3333333333, ans=0.09899494936611666 2023-12-24 02:36:18,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2023-12-24 02:36:26,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459000.0, ans=0.1 2023-12-24 02:36:33,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-12-24 02:36:36,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1459066.6666666667, ans=0.125 2023-12-24 02:36:39,145 INFO [train.py:886] (3/4) Epoch 46, batch 4400, loss[loss=0.00972, audio_tagging_loss=0.00972, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4953559.36 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:37:06,112 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.527e+01 3.966e+01 4.108e+01 4.313e+01 4.794e+01, threshold=8.216e+01, percent-clipped=0.0 2023-12-24 02:37:07,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1459266.6666666667, ans=0.0 2023-12-24 02:37:30,586 INFO [train.py:886] (3/4) Epoch 46, batch 4450, loss[loss=0.01026, audio_tagging_loss=0.01026, over 25000.00 frames. ], tot_loss[loss=0.01102, audio_tagging_loss=0.01102, over 4949497.25 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:37:47,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2023-12-24 02:37:48,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1459533.3333333333, ans=0.2 2023-12-24 02:38:00,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1459666.6666666667, ans=0.125 2023-12-24 02:38:05,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1459666.6666666667, ans=0.1 2023-12-24 02:38:06,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1459666.6666666667, ans=0.125 2023-12-24 02:38:22,183 INFO [train.py:886] (3/4) Epoch 46, batch 4500, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24923.00 frames. ], tot_loss[loss=0.01097, audio_tagging_loss=0.01097, over 4942368.94 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 64.0 2023-12-24 02:38:42,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1459933.3333333333, ans=0.125 2023-12-24 02:38:49,730 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.485e+01 3.900e+01 4.113e+01 4.259e+01 4.782e+01, threshold=8.226e+01, percent-clipped=0.0 2023-12-24 02:38:50,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1459933.3333333333, ans=0.0 2023-12-24 02:38:55,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1460000.0, ans=0.0 2023-12-24 02:39:10,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1460066.6666666667, ans=0.125 2023-12-24 02:39:14,694 INFO [train.py:886] (3/4) Epoch 46, batch 4550, loss[loss=0.01238, audio_tagging_loss=0.01238, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4948408.13 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:39:32,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1460200.0, ans=0.025 2023-12-24 02:39:33,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1460266.6666666667, ans=0.0 2023-12-24 02:39:35,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1460266.6666666667, ans=0.125 2023-12-24 02:39:48,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2023-12-24 02:39:55,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1460400.0, ans=0.0 2023-12-24 02:39:57,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1460400.0, ans=0.0 2023-12-24 02:40:05,488 INFO [train.py:886] (3/4) Epoch 46, batch 4600, loss[loss=0.01002, audio_tagging_loss=0.01002, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4951603.93 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:40:08,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1460466.6666666667, ans=0.125 2023-12-24 02:40:16,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1460533.3333333333, ans=0.0 2023-12-24 02:40:17,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-12-24 02:40:28,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1460600.0, ans=0.0 2023-12-24 02:40:33,136 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.440e+01 3.978e+01 4.125e+01 4.323e+01 5.544e+01, threshold=8.249e+01, percent-clipped=0.0 2023-12-24 02:40:46,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1460733.3333333333, ans=0.125 2023-12-24 02:40:55,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2023-12-24 02:40:57,028 INFO [train.py:886] (3/4) Epoch 46, batch 4650, loss[loss=0.009669, audio_tagging_loss=0.009669, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4958694.56 frames. ], batch size: 100, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:41:17,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1460933.3333333333, ans=0.125 2023-12-24 02:41:18,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1460933.3333333333, ans=0.125 2023-12-24 02:41:19,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1460933.3333333333, ans=0.1 2023-12-24 02:41:25,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=8.0 2023-12-24 02:41:28,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1461000.0, ans=0.125 2023-12-24 02:41:46,466 INFO [train.py:886] (3/4) Epoch 46, batch 4700, loss[loss=0.01025, audio_tagging_loss=0.01025, over 24750.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4960055.30 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:41:46,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1461133.3333333333, ans=0.125 2023-12-24 02:41:56,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1461200.0, ans=0.95 2023-12-24 02:41:57,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1461200.0, ans=0.1 2023-12-24 02:41:58,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1461200.0, ans=0.1 2023-12-24 02:41:59,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1461200.0, ans=0.0 2023-12-24 02:42:12,848 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 3.992e+01 4.134e+01 4.373e+01 5.124e+01, threshold=8.269e+01, percent-clipped=0.0 2023-12-24 02:42:13,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2023-12-24 02:42:32,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-12-24 02:42:34,326 INFO [train.py:886] (3/4) Epoch 46, batch 4750, loss[loss=0.01247, audio_tagging_loss=0.01247, over 24750.00 frames. ], tot_loss[loss=0.01105, audio_tagging_loss=0.01105, over 4951247.45 frames. ], batch size: 99, lr: 2.32e-03, grad_scale: 32.0 2023-12-24 02:42:35,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1461466.6666666667, ans=0.0 2023-12-24 02:42:35,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2023-12-24 02:42:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1461466.6666666667, ans=0.0 2023-12-24 02:43:10,063 INFO [train.py:886] (3/4) Epoch 47, batch 0, loss[loss=0.02454, audio_tagging_loss=0.02454, over 25000.00 frames. ], tot_loss[loss=0.02454, audio_tagging_loss=0.02454, over 25000.00 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:43:10,064 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 02:43:20,436 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1678, 1.1113, 4.6385, 4.3939], device='cuda:3') 2023-12-24 02:43:30,562 INFO [train.py:917] (3/4) Epoch 47, validation: loss=0.0358, audio_tagging_loss=0.0358, over 3737520.00 frames. 2023-12-24 02:43:30,562 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 02:43:32,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1461573.3333333333, ans=0.1 2023-12-24 02:43:32,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1461573.3333333333, ans=0.125 2023-12-24 02:43:39,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1461573.3333333333, ans=0.07 2023-12-24 02:43:40,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1461640.0, ans=0.0 2023-12-24 02:43:51,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1461706.6666666667, ans=0.125 2023-12-24 02:44:06,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1461773.3333333333, ans=0.0 2023-12-24 02:44:22,420 INFO [train.py:886] (3/4) Epoch 47, batch 50, loss[loss=0.01036, audio_tagging_loss=0.01036, over 23995.00 frames. ], tot_loss[loss=0.01768, audio_tagging_loss=0.01768, over 1121110.71 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:44:26,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1461906.6666666667, ans=0.125 2023-12-24 02:44:32,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1461973.3333333333, ans=0.0 2023-12-24 02:44:34,687 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 4.203e+01 4.907e+01 5.637e+01 1.199e+02, threshold=9.813e+01, percent-clipped=7.0 2023-12-24 02:44:36,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1461973.3333333333, ans=0.0 2023-12-24 02:44:49,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1462040.0, ans=0.0 2023-12-24 02:44:49,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-12-24 02:44:54,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1462106.6666666667, ans=0.125 2023-12-24 02:44:58,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1462106.6666666667, ans=0.2 2023-12-24 02:45:13,770 INFO [train.py:886] (3/4) Epoch 47, batch 100, loss[loss=0.01289, audio_tagging_loss=0.01289, over 25000.00 frames. ], tot_loss[loss=0.0153, audio_tagging_loss=0.0153, over 1971554.73 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:45:33,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-24 02:45:40,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2023-12-24 02:45:45,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1462440.0, ans=0.125 2023-12-24 02:45:56,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1462506.6666666667, ans=0.05 2023-12-24 02:45:59,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1462506.6666666667, ans=0.2 2023-12-24 02:46:05,859 INFO [train.py:886] (3/4) Epoch 47, batch 150, loss[loss=0.009108, audio_tagging_loss=0.009108, over 21683.00 frames. ], tot_loss[loss=0.01385, audio_tagging_loss=0.01385, over 2634114.04 frames. ], batch size: 107, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:46:07,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1462573.3333333333, ans=0.125 2023-12-24 02:46:09,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1462573.3333333333, ans=0.04949747468305833 2023-12-24 02:46:09,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=8.0 2023-12-24 02:46:17,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.815e+01 4.110e+01 4.292e+01 4.596e+01 5.407e+01, threshold=8.583e+01, percent-clipped=0.0 2023-12-24 02:46:32,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-12-24 02:46:33,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1462706.6666666667, ans=0.1 2023-12-24 02:46:41,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1462773.3333333333, ans=0.02 2023-12-24 02:46:42,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1462773.3333333333, ans=0.0 2023-12-24 02:46:58,197 INFO [train.py:886] (3/4) Epoch 47, batch 200, loss[loss=0.01018, audio_tagging_loss=0.01018, over 25000.00 frames. ], tot_loss[loss=0.01299, audio_tagging_loss=0.01299, over 3152483.86 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:47:35,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-12-24 02:47:42,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1463173.3333333333, ans=10.0 2023-12-24 02:47:49,239 INFO [train.py:886] (3/4) Epoch 47, batch 250, loss[loss=0.01168, audio_tagging_loss=0.01168, over 25000.00 frames. ], tot_loss[loss=0.0125, audio_tagging_loss=0.0125, over 3557038.39 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:48:01,227 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.589e+01 3.929e+01 4.138e+01 4.313e+01 4.926e+01, threshold=8.277e+01, percent-clipped=0.0 2023-12-24 02:48:08,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-12-24 02:48:11,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2023-12-24 02:48:24,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1463440.0, ans=0.125 2023-12-24 02:48:31,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1463506.6666666667, ans=0.125 2023-12-24 02:48:34,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1463506.6666666667, ans=0.125 2023-12-24 02:48:40,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.01 vs. limit=22.5 2023-12-24 02:48:40,486 INFO [train.py:886] (3/4) Epoch 47, batch 300, loss[loss=0.01109, audio_tagging_loss=0.01109, over 24750.00 frames. ], tot_loss[loss=0.01217, audio_tagging_loss=0.01217, over 3867827.88 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:48:40,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1463573.3333333333, ans=0.0 2023-12-24 02:48:46,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2023-12-24 02:48:49,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1463573.3333333333, ans=0.0 2023-12-24 02:49:05,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1463706.6666666667, ans=0.125 2023-12-24 02:49:23,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-12-24 02:49:25,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1463840.0, ans=0.2 2023-12-24 02:49:29,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1463840.0, ans=0.1 2023-12-24 02:49:31,969 INFO [train.py:886] (3/4) Epoch 47, batch 350, loss[loss=0.01087, audio_tagging_loss=0.01087, over 25000.00 frames. ], tot_loss[loss=0.01196, audio_tagging_loss=0.01196, over 4103539.02 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 16.0 2023-12-24 02:49:37,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1463906.6666666667, ans=0.0 2023-12-24 02:49:44,718 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 3.948e+01 4.136e+01 4.344e+01 5.181e+01, threshold=8.273e+01, percent-clipped=0.0 2023-12-24 02:49:50,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1463973.3333333333, ans=0.125 2023-12-24 02:49:56,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1464040.0, ans=0.125 2023-12-24 02:49:57,959 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:50:24,241 INFO [train.py:886] (3/4) Epoch 47, batch 400, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.0116, audio_tagging_loss=0.0116, over 4287389.08 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:50:27,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1464240.0, ans=0.0 2023-12-24 02:50:32,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1464306.6666666667, ans=0.125 2023-12-24 02:50:33,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1464306.6666666667, ans=0.1 2023-12-24 02:50:43,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1464373.3333333333, ans=0.125 2023-12-24 02:50:46,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-12-24 02:50:55,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1464440.0, ans=0.2 2023-12-24 02:50:55,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-12-24 02:51:09,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464506.6666666667, ans=0.1 2023-12-24 02:51:14,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1464506.6666666667, ans=0.125 2023-12-24 02:51:15,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-12-24 02:51:16,133 INFO [train.py:886] (3/4) Epoch 47, batch 450, loss[loss=0.01006, audio_tagging_loss=0.01006, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4438132.56 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:51:20,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1464573.3333333333, ans=0.0 2023-12-24 02:51:23,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-12-24 02:51:26,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1464640.0, ans=0.05 2023-12-24 02:51:28,924 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.902e+01 4.040e+01 4.251e+01 5.082e+01, threshold=8.080e+01, percent-clipped=0.0 2023-12-24 02:51:34,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464640.0, ans=0.1 2023-12-24 02:51:36,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1464706.6666666667, ans=0.125 2023-12-24 02:51:51,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1464773.3333333333, ans=0.125 2023-12-24 02:52:04,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1464840.0, ans=0.125 2023-12-24 02:52:07,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1464906.6666666667, ans=0.125 2023-12-24 02:52:07,990 INFO [train.py:886] (3/4) Epoch 47, batch 500, loss[loss=0.01182, audio_tagging_loss=0.01182, over 21483.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4550635.40 frames. ], batch size: 107, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:52:30,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1465040.0, ans=0.125 2023-12-24 02:52:32,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1465040.0, ans=0.1 2023-12-24 02:52:38,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1465106.6666666667, ans=0.0 2023-12-24 02:52:38,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2023-12-24 02:52:42,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1465106.6666666667, ans=0.125 2023-12-24 02:52:48,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1465173.3333333333, ans=0.0 2023-12-24 02:53:00,333 INFO [train.py:886] (3/4) Epoch 47, batch 550, loss[loss=0.008932, audio_tagging_loss=0.008932, over 24060.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4634563.03 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:53:01,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1465240.0, ans=0.125 2023-12-24 02:53:12,414 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.441e+01 3.969e+01 4.099e+01 4.262e+01 5.027e+01, threshold=8.197e+01, percent-clipped=0.0 2023-12-24 02:53:14,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1465306.6666666667, ans=0.0 2023-12-24 02:53:34,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-12-24 02:53:36,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.54 vs. limit=10.0 2023-12-24 02:53:40,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1465506.6666666667, ans=0.125 2023-12-24 02:53:51,811 INFO [train.py:886] (3/4) Epoch 47, batch 600, loss[loss=0.01159, audio_tagging_loss=0.01159, over 24750.00 frames. ], tot_loss[loss=0.01112, audio_tagging_loss=0.01112, over 4705870.61 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:54:09,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1465640.0, ans=0.125 2023-12-24 02:54:15,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1465706.6666666667, ans=0.0 2023-12-24 02:54:34,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1465840.0, ans=0.035 2023-12-24 02:54:43,455 INFO [train.py:886] (3/4) Epoch 47, batch 650, loss[loss=0.00966, audio_tagging_loss=0.00966, over 24750.00 frames. ], tot_loss[loss=0.01113, audio_tagging_loss=0.01113, over 4753537.93 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:54:49,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1465906.6666666667, ans=0.125 2023-12-24 02:54:55,401 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.474e+01 3.862e+01 4.034e+01 4.310e+01 5.761e+01, threshold=8.068e+01, percent-clipped=0.0 2023-12-24 02:55:00,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1465973.3333333333, ans=0.125 2023-12-24 02:55:24,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466173.3333333333, ans=0.1 2023-12-24 02:55:26,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 02:55:34,948 INFO [train.py:886] (3/4) Epoch 47, batch 700, loss[loss=0.01262, audio_tagging_loss=0.01262, over 23999.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4790757.60 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:55:37,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1466240.0, ans=0.125 2023-12-24 02:55:51,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1466306.6666666667, ans=0.05 2023-12-24 02:55:57,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-12-24 02:55:58,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1466373.3333333333, ans=0.125 2023-12-24 02:56:02,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1466373.3333333333, ans=0.0 2023-12-24 02:56:07,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1466440.0, ans=0.125 2023-12-24 02:56:11,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1466440.0, ans=0.05 2023-12-24 02:56:11,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1466440.0, ans=0.0 2023-12-24 02:56:24,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1466506.6666666667, ans=0.1 2023-12-24 02:56:24,707 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.95 vs. limit=10.0 2023-12-24 02:56:26,189 INFO [train.py:886] (3/4) Epoch 47, batch 750, loss[loss=0.01098, audio_tagging_loss=0.01098, over 25000.00 frames. ], tot_loss[loss=0.011, audio_tagging_loss=0.011, over 4830503.08 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:56:38,898 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.880e+01 4.112e+01 4.307e+01 5.752e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 02:57:06,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1466773.3333333333, ans=0.0 2023-12-24 02:57:20,496 INFO [train.py:886] (3/4) Epoch 47, batch 800, loss[loss=0.01041, audio_tagging_loss=0.01041, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4863812.19 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:57:30,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1466973.3333333333, ans=0.0 2023-12-24 02:58:09,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1467173.3333333333, ans=0.1 2023-12-24 02:58:12,281 INFO [train.py:886] (3/4) Epoch 47, batch 850, loss[loss=0.01186, audio_tagging_loss=0.01186, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4886452.30 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:58:20,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2023-12-24 02:58:25,024 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.509e+01 3.885e+01 4.046e+01 4.247e+01 5.015e+01, threshold=8.092e+01, percent-clipped=0.0 2023-12-24 02:58:29,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1467306.6666666667, ans=0.1 2023-12-24 02:58:30,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1467306.6666666667, ans=0.2 2023-12-24 02:58:33,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1467373.3333333333, ans=0.125 2023-12-24 02:58:39,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1467373.3333333333, ans=0.0 2023-12-24 02:58:49,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.54 vs. limit=12.0 2023-12-24 02:58:54,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1467506.6666666667, ans=0.2 2023-12-24 02:59:04,297 INFO [train.py:886] (3/4) Epoch 47, batch 900, loss[loss=0.01373, audio_tagging_loss=0.01373, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4902481.92 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:59:10,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=22.5 2023-12-24 02:59:22,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1467640.0, ans=0.125 2023-12-24 02:59:40,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1467773.3333333333, ans=0.125 2023-12-24 02:59:41,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1467773.3333333333, ans=0.1 2023-12-24 02:59:47,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2023-12-24 02:59:53,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1467840.0, ans=0.1 2023-12-24 02:59:54,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1467840.0, ans=0.125 2023-12-24 02:59:56,681 INFO [train.py:886] (3/4) Epoch 47, batch 950, loss[loss=0.00969, audio_tagging_loss=0.00969, over 24750.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4904122.34 frames. ], batch size: 99, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 02:59:57,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1467906.6666666667, ans=0.2 2023-12-24 03:00:03,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2023-12-24 03:00:08,634 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.658e+01 3.944e+01 4.162e+01 4.322e+01 5.155e+01, threshold=8.324e+01, percent-clipped=0.0 2023-12-24 03:00:09,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1467973.3333333333, ans=0.125 2023-12-24 03:00:20,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1468040.0, ans=0.0 2023-12-24 03:00:28,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2023-12-24 03:00:41,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1468173.3333333333, ans=0.2 2023-12-24 03:00:41,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-12-24 03:00:48,912 INFO [train.py:886] (3/4) Epoch 47, batch 1000, loss[loss=0.01021, audio_tagging_loss=0.01021, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4912895.10 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:00:57,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1468240.0, ans=0.1 2023-12-24 03:01:08,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.69 vs. limit=10.0 2023-12-24 03:01:29,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1468506.6666666667, ans=0.125 2023-12-24 03:01:29,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1468506.6666666667, ans=0.0 2023-12-24 03:01:35,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2023-12-24 03:01:38,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1468506.6666666667, ans=0.125 2023-12-24 03:01:38,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1468506.6666666667, ans=0.0 2023-12-24 03:01:40,683 INFO [train.py:886] (3/4) Epoch 47, batch 1050, loss[loss=0.008958, audio_tagging_loss=0.008958, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4912762.87 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:01:44,238 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-12-24 03:01:53,565 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.919e+01 4.119e+01 4.342e+01 4.813e+01, threshold=8.238e+01, percent-clipped=0.0 2023-12-24 03:02:24,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1468840.0, ans=0.1 2023-12-24 03:02:32,946 INFO [train.py:886] (3/4) Epoch 47, batch 1100, loss[loss=0.00854, audio_tagging_loss=0.00854, over 23994.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4918383.23 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:02:34,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-12-24 03:02:43,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-12-24 03:02:57,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1469040.0, ans=0.125 2023-12-24 03:03:03,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1469106.6666666667, ans=0.125 2023-12-24 03:03:08,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1469106.6666666667, ans=0.125 2023-12-24 03:03:10,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-12-24 03:03:23,786 INFO [train.py:886] (3/4) Epoch 47, batch 1150, loss[loss=0.01128, audio_tagging_loss=0.01128, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4929916.46 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:03:33,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1469306.6666666667, ans=0.125 2023-12-24 03:03:37,301 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.450e+01 3.892e+01 4.065e+01 4.219e+01 4.911e+01, threshold=8.130e+01, percent-clipped=0.0 2023-12-24 03:03:39,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1469306.6666666667, ans=0.1 2023-12-24 03:03:40,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1469306.6666666667, ans=0.0 2023-12-24 03:03:53,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1469373.3333333333, ans=0.125 2023-12-24 03:04:01,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1469440.0, ans=0.125 2023-12-24 03:04:17,422 INFO [train.py:886] (3/4) Epoch 47, batch 1200, loss[loss=0.009665, audio_tagging_loss=0.009665, over 21765.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4935023.97 frames. ], batch size: 107, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:04:29,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-12-24 03:04:30,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1469640.0, ans=0.2 2023-12-24 03:04:42,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1469706.6666666667, ans=0.0 2023-12-24 03:04:43,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1469706.6666666667, ans=0.0 2023-12-24 03:04:55,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1469773.3333333333, ans=0.125 2023-12-24 03:04:59,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1469840.0, ans=0.125 2023-12-24 03:04:59,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1469840.0, ans=0.0 2023-12-24 03:05:01,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1469840.0, ans=0.1 2023-12-24 03:05:07,740 INFO [train.py:886] (3/4) Epoch 47, batch 1250, loss[loss=0.008519, audio_tagging_loss=0.008519, over 24019.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4937791.11 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:05:20,725 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 3.991e+01 4.155e+01 4.310e+01 5.132e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 03:05:26,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1469973.3333333333, ans=0.0 2023-12-24 03:05:59,441 INFO [train.py:886] (3/4) Epoch 47, batch 1300, loss[loss=0.01129, audio_tagging_loss=0.01129, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4937149.47 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 32.0 2023-12-24 03:06:23,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=12.0 2023-12-24 03:06:36,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1470440.0, ans=0.125 2023-12-24 03:06:52,376 INFO [train.py:886] (3/4) Epoch 47, batch 1350, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4932274.82 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:07:03,683 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 3.933e+01 4.111e+01 4.315e+01 5.636e+01, threshold=8.222e+01, percent-clipped=0.0 2023-12-24 03:07:06,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1470640.0, ans=0.1 2023-12-24 03:07:14,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1470706.6666666667, ans=0.0 2023-12-24 03:07:17,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2023-12-24 03:07:28,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-12-24 03:07:33,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1470840.0, ans=0.125 2023-12-24 03:07:39,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-12-24 03:07:43,725 INFO [train.py:886] (3/4) Epoch 47, batch 1400, loss[loss=0.01034, audio_tagging_loss=0.01034, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4934668.68 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:08:07,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1471040.0, ans=0.0 2023-12-24 03:08:24,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-12-24 03:08:26,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1471173.3333333333, ans=0.0 2023-12-24 03:08:35,968 INFO [train.py:886] (3/4) Epoch 47, batch 1450, loss[loss=0.0119, audio_tagging_loss=0.0119, over 25000.00 frames. ], tot_loss[loss=0.01089, audio_tagging_loss=0.01089, over 4945544.44 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:08:36,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1471240.0, ans=0.125 2023-12-24 03:08:37,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-12-24 03:08:41,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1471240.0, ans=0.0 2023-12-24 03:08:46,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1471306.6666666667, ans=0.125 2023-12-24 03:08:48,109 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.547e+01 3.853e+01 4.015e+01 4.195e+01 8.350e+01, threshold=8.030e+01, percent-clipped=1.0 2023-12-24 03:08:50,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1471306.6666666667, ans=0.0 2023-12-24 03:08:50,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-12-24 03:08:51,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1471306.6666666667, ans=0.125 2023-12-24 03:08:52,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1471306.6666666667, ans=0.125 2023-12-24 03:09:02,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-12-24 03:09:09,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1471440.0, ans=0.125 2023-12-24 03:09:17,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1471506.6666666667, ans=0.0 2023-12-24 03:09:26,265 INFO [train.py:886] (3/4) Epoch 47, batch 1500, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4950297.78 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:09:26,416 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:09:31,939 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:09:42,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-12-24 03:09:43,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1471640.0, ans=0.125 2023-12-24 03:09:49,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2023-12-24 03:09:55,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-12-24 03:10:08,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-12-24 03:10:17,895 INFO [train.py:886] (3/4) Epoch 47, batch 1550, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24943.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4954068.61 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:10:24,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-12-24 03:10:29,798 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.710e+01 4.030e+01 4.186e+01 4.353e+01 4.618e+01, threshold=8.371e+01, percent-clipped=0.0 2023-12-24 03:10:36,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-12-24 03:10:46,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1472040.0, ans=0.2 2023-12-24 03:11:09,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1472240.0, ans=0.125 2023-12-24 03:11:10,587 INFO [train.py:886] (3/4) Epoch 47, batch 1600, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4949852.20 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:11:12,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2023-12-24 03:11:18,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1472240.0, ans=0.0 2023-12-24 03:11:19,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2023-12-24 03:11:24,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1472306.6666666667, ans=0.125 2023-12-24 03:11:26,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1472306.6666666667, ans=0.2 2023-12-24 03:11:28,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1472306.6666666667, ans=0.0 2023-12-24 03:11:46,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-12-24 03:11:57,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1472506.6666666667, ans=0.125 2023-12-24 03:12:01,421 INFO [train.py:886] (3/4) Epoch 47, batch 1650, loss[loss=0.009918, audio_tagging_loss=0.009918, over 24750.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4944898.06 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:12:06,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-12-24 03:12:06,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-12-24 03:12:14,045 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.647e+01 4.031e+01 4.196e+01 4.409e+01 5.344e+01, threshold=8.391e+01, percent-clipped=0.0 2023-12-24 03:12:17,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2023-12-24 03:12:26,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1472706.6666666667, ans=0.125 2023-12-24 03:12:31,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1472773.3333333333, ans=0.02 2023-12-24 03:12:35,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1472773.3333333333, ans=0.0 2023-12-24 03:12:52,662 INFO [train.py:886] (3/4) Epoch 47, batch 1700, loss[loss=0.0104, audio_tagging_loss=0.0104, over 25000.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4951654.17 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:12:54,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1472906.6666666667, ans=0.0 2023-12-24 03:12:58,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1472906.6666666667, ans=15.0 2023-12-24 03:13:02,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1472973.3333333333, ans=0.0 2023-12-24 03:13:10,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1472973.3333333333, ans=0.2 2023-12-24 03:13:13,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1473040.0, ans=0.125 2023-12-24 03:13:20,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1473040.0, ans=0.2 2023-12-24 03:13:21,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=15.0 2023-12-24 03:13:36,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-12-24 03:13:37,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-12-24 03:13:43,958 INFO [train.py:886] (3/4) Epoch 47, batch 1750, loss[loss=0.01233, audio_tagging_loss=0.01233, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4956163.72 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:13:45,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=12.0 2023-12-24 03:13:54,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1473306.6666666667, ans=0.125 2023-12-24 03:13:56,767 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 3.924e+01 4.095e+01 4.271e+01 4.874e+01, threshold=8.190e+01, percent-clipped=0.0 2023-12-24 03:14:06,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1473373.3333333333, ans=0.07 2023-12-24 03:14:30,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1473506.6666666667, ans=0.125 2023-12-24 03:14:32,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-12-24 03:14:35,514 INFO [train.py:886] (3/4) Epoch 47, batch 1800, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4963675.36 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:15:01,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1473706.6666666667, ans=0.05 2023-12-24 03:15:03,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1473706.6666666667, ans=0.0 2023-12-24 03:15:27,778 INFO [train.py:886] (3/4) Epoch 47, batch 1850, loss[loss=0.01205, audio_tagging_loss=0.01205, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4962257.32 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:15:28,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1473906.6666666667, ans=0.125 2023-12-24 03:15:29,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1473906.6666666667, ans=0.0 2023-12-24 03:15:39,845 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.565e+01 3.927e+01 4.095e+01 4.266e+01 4.764e+01, threshold=8.189e+01, percent-clipped=0.0 2023-12-24 03:15:48,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1474040.0, ans=0.2 2023-12-24 03:15:54,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1474040.0, ans=0.0 2023-12-24 03:16:19,731 INFO [train.py:886] (3/4) Epoch 47, batch 1900, loss[loss=0.009186, audio_tagging_loss=0.009186, over 22298.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4954434.58 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:16:29,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1474306.6666666667, ans=0.2 2023-12-24 03:16:37,263 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:16:39,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1474306.6666666667, ans=0.125 2023-12-24 03:17:12,126 INFO [train.py:886] (3/4) Epoch 47, batch 1950, loss[loss=0.01234, audio_tagging_loss=0.01234, over 24938.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4955611.47 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:17:15,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1474573.3333333333, ans=0.125 2023-12-24 03:17:24,063 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.930e+01 4.132e+01 4.340e+01 4.631e+01, threshold=8.265e+01, percent-clipped=0.0 2023-12-24 03:17:28,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1474640.0, ans=0.125 2023-12-24 03:17:37,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1474706.6666666667, ans=0.125 2023-12-24 03:17:40,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1474706.6666666667, ans=0.0 2023-12-24 03:17:40,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1474706.6666666667, ans=0.125 2023-12-24 03:17:47,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2023-12-24 03:18:01,390 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:18:02,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2023-12-24 03:18:04,037 INFO [train.py:886] (3/4) Epoch 47, batch 2000, loss[loss=0.0119, audio_tagging_loss=0.0119, over 22114.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4954174.71 frames. ], batch size: 107, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:18:19,847 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:18:20,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1474973.3333333333, ans=10.0 2023-12-24 03:18:30,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1475040.0, ans=0.5 2023-12-24 03:18:38,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1475106.6666666667, ans=0.0 2023-12-24 03:18:44,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1475173.3333333333, ans=0.0 2023-12-24 03:18:49,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1475173.3333333333, ans=0.125 2023-12-24 03:18:56,072 INFO [train.py:886] (3/4) Epoch 47, batch 2050, loss[loss=0.009151, audio_tagging_loss=0.009151, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4952907.29 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:18:58,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-12-24 03:19:09,101 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.482e+01 3.891e+01 4.061e+01 4.208e+01 4.839e+01, threshold=8.122e+01, percent-clipped=0.0 2023-12-24 03:19:09,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-12-24 03:19:18,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1475373.3333333333, ans=0.0 2023-12-24 03:19:22,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1475373.3333333333, ans=0.0 2023-12-24 03:19:41,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1475506.6666666667, ans=0.125 2023-12-24 03:19:47,634 INFO [train.py:886] (3/4) Epoch 47, batch 2100, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4953324.99 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:19:54,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1475573.3333333333, ans=0.125 2023-12-24 03:19:57,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:20:17,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1475773.3333333333, ans=0.0 2023-12-24 03:20:30,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1475840.0, ans=0.2 2023-12-24 03:20:34,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.87 vs. limit=10.0 2023-12-24 03:20:38,877 INFO [train.py:886] (3/4) Epoch 47, batch 2150, loss[loss=0.01141, audio_tagging_loss=0.01141, over 24075.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4954021.15 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:20:50,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1475973.3333333333, ans=0.125 2023-12-24 03:20:52,741 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.593e+01 3.982e+01 4.167e+01 4.321e+01 5.208e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 03:21:08,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1476040.0, ans=0.125 2023-12-24 03:21:09,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1476106.6666666667, ans=0.125 2023-12-24 03:21:26,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1476173.3333333333, ans=0.1 2023-12-24 03:21:30,436 INFO [train.py:886] (3/4) Epoch 47, batch 2200, loss[loss=0.01206, audio_tagging_loss=0.01206, over 25000.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4950610.64 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:21:32,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1476240.0, ans=0.125 2023-12-24 03:22:07,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1476440.0, ans=0.2 2023-12-24 03:22:23,333 INFO [train.py:886] (3/4) Epoch 47, batch 2250, loss[loss=0.01403, audio_tagging_loss=0.01403, over 25000.00 frames. ], tot_loss[loss=0.01098, audio_tagging_loss=0.01098, over 4949220.93 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:22:35,565 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.670e+01 3.897e+01 4.118e+01 4.297e+01 5.377e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:22:37,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1476640.0, ans=0.0 2023-12-24 03:22:41,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1476640.0, ans=0.0 2023-12-24 03:22:57,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1476773.3333333333, ans=0.07 2023-12-24 03:23:04,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1476840.0, ans=0.125 2023-12-24 03:23:05,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1476840.0, ans=0.0 2023-12-24 03:23:11,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1476840.0, ans=0.0 2023-12-24 03:23:14,274 INFO [train.py:886] (3/4) Epoch 47, batch 2300, loss[loss=0.01246, audio_tagging_loss=0.01246, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4944531.26 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:23:15,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1476906.6666666667, ans=0.0 2023-12-24 03:23:16,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1476906.6666666667, ans=0.125 2023-12-24 03:23:29,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1476973.3333333333, ans=0.2 2023-12-24 03:23:30,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1476973.3333333333, ans=0.2 2023-12-24 03:23:37,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-12-24 03:23:45,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1477106.6666666667, ans=0.125 2023-12-24 03:23:56,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-12-24 03:23:57,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1477173.3333333333, ans=0.125 2023-12-24 03:24:05,696 INFO [train.py:886] (3/4) Epoch 47, batch 2350, loss[loss=0.008528, audio_tagging_loss=0.008528, over 24750.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4947679.48 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:24:19,377 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.580e+01 3.910e+01 4.055e+01 4.262e+01 5.306e+01, threshold=8.110e+01, percent-clipped=0.0 2023-12-24 03:24:23,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1477306.6666666667, ans=0.125 2023-12-24 03:24:33,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1477373.3333333333, ans=0.0 2023-12-24 03:24:36,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1477440.0, ans=0.2 2023-12-24 03:24:41,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1477440.0, ans=0.125 2023-12-24 03:24:58,085 INFO [train.py:886] (3/4) Epoch 47, batch 2400, loss[loss=0.01051, audio_tagging_loss=0.01051, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4955743.22 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:25:07,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1477640.0, ans=0.125 2023-12-24 03:25:10,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477640.0, ans=0.1 2023-12-24 03:25:30,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1477773.3333333333, ans=0.1 2023-12-24 03:25:31,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-12-24 03:25:39,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1477840.0, ans=0.0 2023-12-24 03:25:44,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-12-24 03:25:49,220 INFO [train.py:886] (3/4) Epoch 47, batch 2450, loss[loss=0.01352, audio_tagging_loss=0.01352, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4956972.19 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:25:55,921 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:26:03,618 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.970e+01 4.140e+01 4.271e+01 4.902e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 03:26:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1478106.6666666667, ans=0.0 2023-12-24 03:26:31,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2023-12-24 03:26:42,132 INFO [train.py:886] (3/4) Epoch 47, batch 2500, loss[loss=0.01228, audio_tagging_loss=0.01228, over 24750.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4952165.97 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:26:54,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1478306.6666666667, ans=0.0 2023-12-24 03:27:11,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1478373.3333333333, ans=0.0 2023-12-24 03:27:14,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1478440.0, ans=0.0 2023-12-24 03:27:31,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1478506.6666666667, ans=0.2 2023-12-24 03:27:33,069 INFO [train.py:886] (3/4) Epoch 47, batch 2550, loss[loss=0.01086, audio_tagging_loss=0.01086, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4950595.95 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:27:46,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-12-24 03:27:47,510 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.559e+01 3.962e+01 4.101e+01 4.307e+01 5.190e+01, threshold=8.202e+01, percent-clipped=0.0 2023-12-24 03:27:47,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1478640.0, ans=0.0 2023-12-24 03:27:55,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1478706.6666666667, ans=0.0 2023-12-24 03:27:57,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1478706.6666666667, ans=0.2 2023-12-24 03:27:59,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1478706.6666666667, ans=0.05 2023-12-24 03:28:25,396 INFO [train.py:886] (3/4) Epoch 47, batch 2600, loss[loss=0.01211, audio_tagging_loss=0.01211, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4942815.58 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:28:35,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1478973.3333333333, ans=0.125 2023-12-24 03:28:37,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2023-12-24 03:28:37,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1478973.3333333333, ans=0.0 2023-12-24 03:28:39,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1478973.3333333333, ans=0.05 2023-12-24 03:28:39,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1478973.3333333333, ans=0.0 2023-12-24 03:28:55,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1479040.0, ans=0.125 2023-12-24 03:28:58,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1479106.6666666667, ans=0.125 2023-12-24 03:28:59,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1479106.6666666667, ans=0.125 2023-12-24 03:28:59,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1479106.6666666667, ans=0.0 2023-12-24 03:29:00,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1479106.6666666667, ans=0.125 2023-12-24 03:29:17,372 INFO [train.py:886] (3/4) Epoch 47, batch 2650, loss[loss=0.009192, audio_tagging_loss=0.009192, over 25000.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4939527.60 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:29:18,592 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:29:22,983 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:29:26,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1479306.6666666667, ans=0.125 2023-12-24 03:29:28,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1479306.6666666667, ans=0.0 2023-12-24 03:29:30,365 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.913e+01 4.117e+01 4.340e+01 5.739e+01, threshold=8.234e+01, percent-clipped=0.0 2023-12-24 03:29:31,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.0 2023-12-24 03:29:39,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-12-24 03:29:50,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-12-24 03:30:08,767 INFO [train.py:886] (3/4) Epoch 47, batch 2700, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4941806.58 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:30:29,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1479706.6666666667, ans=0.04949747468305833 2023-12-24 03:30:30,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.64 vs. limit=10.0 2023-12-24 03:30:46,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1479773.3333333333, ans=0.0 2023-12-24 03:31:01,092 INFO [train.py:886] (3/4) Epoch 47, batch 2750, loss[loss=0.009196, audio_tagging_loss=0.009196, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4947362.23 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:31:14,040 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.907e+01 4.081e+01 4.248e+01 4.984e+01, threshold=8.163e+01, percent-clipped=0.0 2023-12-24 03:31:20,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1480040.0, ans=0.0 2023-12-24 03:31:26,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-12-24 03:31:51,817 INFO [train.py:886] (3/4) Epoch 47, batch 2800, loss[loss=0.0128, audio_tagging_loss=0.0128, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4949028.31 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:31:54,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-12-24 03:32:00,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1480240.0, ans=0.125 2023-12-24 03:32:13,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1480373.3333333333, ans=0.125 2023-12-24 03:32:34,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1480506.6666666667, ans=0.035 2023-12-24 03:32:36,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1480506.6666666667, ans=0.125 2023-12-24 03:32:42,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1480506.6666666667, ans=0.1 2023-12-24 03:32:43,755 INFO [train.py:886] (3/4) Epoch 47, batch 2850, loss[loss=0.01071, audio_tagging_loss=0.01071, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4949440.86 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:32:53,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=15.0 2023-12-24 03:32:56,595 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.697e+01 4.004e+01 4.137e+01 4.360e+01 4.931e+01, threshold=8.275e+01, percent-clipped=0.0 2023-12-24 03:33:07,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-12-24 03:33:17,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1480773.3333333333, ans=0.0 2023-12-24 03:33:34,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1480906.6666666667, ans=0.1 2023-12-24 03:33:34,994 INFO [train.py:886] (3/4) Epoch 47, batch 2900, loss[loss=0.01024, audio_tagging_loss=0.01024, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4943786.03 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:33:43,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1480906.6666666667, ans=0.125 2023-12-24 03:33:56,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1481040.0, ans=0.125 2023-12-24 03:34:01,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.39 vs. limit=5.0 2023-12-24 03:34:27,553 INFO [train.py:886] (3/4) Epoch 47, batch 2950, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4945520.04 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:34:28,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2023-12-24 03:34:30,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1481240.0, ans=0.2 2023-12-24 03:34:33,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1481240.0, ans=0.07 2023-12-24 03:34:35,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1481240.0, ans=0.125 2023-12-24 03:34:41,281 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.488e+01 3.898e+01 4.063e+01 4.276e+01 4.870e+01, threshold=8.126e+01, percent-clipped=0.0 2023-12-24 03:34:50,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1481373.3333333333, ans=0.125 2023-12-24 03:35:05,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1481440.0, ans=0.1 2023-12-24 03:35:14,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2023-12-24 03:35:19,983 INFO [train.py:886] (3/4) Epoch 47, batch 3000, loss[loss=0.009287, audio_tagging_loss=0.009287, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4951223.75 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:35:19,983 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 03:35:37,109 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6613, 3.9836, 4.1794, 3.8968], device='cuda:3') 2023-12-24 03:35:41,607 INFO [train.py:917] (3/4) Epoch 47, validation: loss=0.03661, audio_tagging_loss=0.03661, over 3737520.00 frames. 2023-12-24 03:35:41,608 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 03:35:51,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1481640.0, ans=0.125 2023-12-24 03:36:19,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1481773.3333333333, ans=0.125 2023-12-24 03:36:25,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1481840.0, ans=0.0 2023-12-24 03:36:26,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-12-24 03:36:33,024 INFO [train.py:886] (3/4) Epoch 47, batch 3050, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948839.60 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:36:37,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-12-24 03:36:46,060 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 3.890e+01 4.067e+01 4.265e+01 5.158e+01, threshold=8.135e+01, percent-clipped=0.0 2023-12-24 03:36:55,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1482040.0, ans=0.0 2023-12-24 03:37:04,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1482106.6666666667, ans=0.125 2023-12-24 03:37:10,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1482106.6666666667, ans=0.0 2023-12-24 03:37:11,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1482106.6666666667, ans=0.0 2023-12-24 03:37:18,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1482173.3333333333, ans=0.125 2023-12-24 03:37:25,364 INFO [train.py:886] (3/4) Epoch 47, batch 3100, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4950816.62 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:37:29,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1482240.0, ans=0.1 2023-12-24 03:37:40,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1482306.6666666667, ans=0.05 2023-12-24 03:37:43,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1482306.6666666667, ans=0.0 2023-12-24 03:37:53,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1482373.3333333333, ans=0.02 2023-12-24 03:38:08,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-12-24 03:38:15,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1482573.3333333333, ans=0.0 2023-12-24 03:38:16,241 INFO [train.py:886] (3/4) Epoch 47, batch 3150, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4944284.30 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:38:16,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-12-24 03:38:21,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1482573.3333333333, ans=0.0 2023-12-24 03:38:30,724 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.480e+01 3.934e+01 4.165e+01 4.401e+01 5.350e+01, threshold=8.330e+01, percent-clipped=0.0 2023-12-24 03:38:36,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1482706.6666666667, ans=0.125 2023-12-24 03:38:45,200 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:38:51,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.40 vs. limit=10.0 2023-12-24 03:38:53,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-12-24 03:39:04,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.86 vs. limit=22.5 2023-12-24 03:39:08,911 INFO [train.py:886] (3/4) Epoch 47, batch 3200, loss[loss=0.009476, audio_tagging_loss=0.009476, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4942239.89 frames. ], batch size: 99, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:39:14,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1482906.6666666667, ans=0.0 2023-12-24 03:39:20,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1482973.3333333333, ans=0.0 2023-12-24 03:39:21,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1482973.3333333333, ans=0.0 2023-12-24 03:39:25,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1482973.3333333333, ans=0.02 2023-12-24 03:39:28,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1482973.3333333333, ans=0.1 2023-12-24 03:39:40,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1483106.6666666667, ans=0.125 2023-12-24 03:39:45,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-12-24 03:39:51,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1483173.3333333333, ans=0.0 2023-12-24 03:40:00,764 INFO [train.py:886] (3/4) Epoch 47, batch 3250, loss[loss=0.00971, audio_tagging_loss=0.00971, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4942620.34 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 32.0 2023-12-24 03:40:05,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1483240.0, ans=0.125 2023-12-24 03:40:14,365 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.530e+01 3.967e+01 4.152e+01 4.354e+01 4.796e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 03:40:37,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1483440.0, ans=0.125 2023-12-24 03:40:52,244 INFO [train.py:886] (3/4) Epoch 47, batch 3300, loss[loss=0.008741, audio_tagging_loss=0.008741, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4944557.38 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:00,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1483573.3333333333, ans=0.2 2023-12-24 03:41:19,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1483706.6666666667, ans=0.125 2023-12-24 03:41:25,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1483773.3333333333, ans=0.035 2023-12-24 03:41:37,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2023-12-24 03:41:43,696 INFO [train.py:886] (3/4) Epoch 47, batch 3350, loss[loss=0.01155, audio_tagging_loss=0.01155, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4949996.36 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:41:43,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1483906.6666666667, ans=0.1 2023-12-24 03:41:57,460 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.599e+01 3.920e+01 4.118e+01 4.246e+01 4.815e+01, threshold=8.236e+01, percent-clipped=0.0 2023-12-24 03:41:57,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1483973.3333333333, ans=0.125 2023-12-24 03:42:16,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1484106.6666666667, ans=0.0 2023-12-24 03:42:23,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1484106.6666666667, ans=0.0 2023-12-24 03:42:27,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-12-24 03:42:35,590 INFO [train.py:886] (3/4) Epoch 47, batch 3400, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4956258.77 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:42:42,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1484240.0, ans=0.125 2023-12-24 03:42:47,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2023-12-24 03:42:48,477 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:42:49,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1484306.6666666667, ans=0.0 2023-12-24 03:42:50,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1484306.6666666667, ans=0.125 2023-12-24 03:42:52,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1484306.6666666667, ans=0.125 2023-12-24 03:43:02,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1484373.3333333333, ans=0.035 2023-12-24 03:43:27,005 INFO [train.py:886] (3/4) Epoch 47, batch 3450, loss[loss=0.01293, audio_tagging_loss=0.01293, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4954257.35 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:43:29,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1484573.3333333333, ans=0.125 2023-12-24 03:43:40,799 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.504e+01 3.946e+01 4.174e+01 4.346e+01 5.015e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 03:43:41,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1484640.0, ans=0.125 2023-12-24 03:43:56,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1484706.6666666667, ans=0.0 2023-12-24 03:44:19,474 INFO [train.py:886] (3/4) Epoch 47, batch 3500, loss[loss=0.008994, audio_tagging_loss=0.008994, over 24750.00 frames. ], tot_loss[loss=0.01095, audio_tagging_loss=0.01095, over 4949458.55 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:44:19,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1484906.6666666667, ans=0.125 2023-12-24 03:44:54,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=12.0 2023-12-24 03:45:10,893 INFO [train.py:886] (3/4) Epoch 47, batch 3550, loss[loss=0.01084, audio_tagging_loss=0.01084, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4951451.19 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:45:24,588 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.366e+01 3.931e+01 4.091e+01 4.261e+01 4.917e+01, threshold=8.182e+01, percent-clipped=0.0 2023-12-24 03:45:33,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1485373.3333333333, ans=0.1 2023-12-24 03:45:59,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1485506.6666666667, ans=0.1 2023-12-24 03:46:01,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2023-12-24 03:46:02,705 INFO [train.py:886] (3/4) Epoch 47, batch 3600, loss[loss=0.01284, audio_tagging_loss=0.01284, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4955224.73 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:46:12,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1485640.0, ans=0.5 2023-12-24 03:46:16,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1485640.0, ans=0.125 2023-12-24 03:46:36,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1485773.3333333333, ans=0.0 2023-12-24 03:46:38,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2023-12-24 03:46:55,304 INFO [train.py:886] (3/4) Epoch 47, batch 3650, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4956005.53 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:47:00,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1485906.6666666667, ans=0.1 2023-12-24 03:47:01,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1485906.6666666667, ans=0.125 2023-12-24 03:47:08,291 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.641e+01 3.947e+01 4.112e+01 4.320e+01 4.973e+01, threshold=8.224e+01, percent-clipped=0.0 2023-12-24 03:47:33,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1486106.6666666667, ans=0.5 2023-12-24 03:47:35,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-12-24 03:47:45,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1486173.3333333333, ans=0.5 2023-12-24 03:47:46,828 INFO [train.py:886] (3/4) Epoch 47, batch 3700, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4951782.57 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:47:54,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1486240.0, ans=0.0 2023-12-24 03:48:07,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1486373.3333333333, ans=0.1 2023-12-24 03:48:36,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-12-24 03:48:37,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1486506.6666666667, ans=0.0 2023-12-24 03:48:39,065 INFO [train.py:886] (3/4) Epoch 47, batch 3750, loss[loss=0.01285, audio_tagging_loss=0.01285, over 24944.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4949706.77 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:48:40,272 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:48:50,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-12-24 03:48:51,904 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.685e+01 4.010e+01 4.129e+01 4.399e+01 4.975e+01, threshold=8.259e+01, percent-clipped=0.0 2023-12-24 03:49:03,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-24 03:49:10,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486773.3333333333, ans=0.1 2023-12-24 03:49:21,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1486840.0, ans=0.1 2023-12-24 03:49:29,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2023-12-24 03:49:30,071 INFO [train.py:886] (3/4) Epoch 47, batch 3800, loss[loss=0.01198, audio_tagging_loss=0.01198, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4946318.33 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:49:40,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1486973.3333333333, ans=0.0 2023-12-24 03:49:44,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1486973.3333333333, ans=0.5 2023-12-24 03:49:48,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1486973.3333333333, ans=0.125 2023-12-24 03:49:50,312 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:49:59,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1487106.6666666667, ans=0.125 2023-12-24 03:50:00,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1487106.6666666667, ans=0.125 2023-12-24 03:50:01,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1487106.6666666667, ans=0.125 2023-12-24 03:50:12,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1487173.3333333333, ans=0.07 2023-12-24 03:50:13,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1487173.3333333333, ans=0.0 2023-12-24 03:50:22,436 INFO [train.py:886] (3/4) Epoch 47, batch 3850, loss[loss=0.01146, audio_tagging_loss=0.01146, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4947951.41 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:50:35,802 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 3.975e+01 4.146e+01 4.354e+01 5.243e+01, threshold=8.293e+01, percent-clipped=0.0 2023-12-24 03:51:00,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1487440.0, ans=0.125 2023-12-24 03:51:05,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2023-12-24 03:51:07,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:08,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2023-12-24 03:51:09,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1487506.6666666667, ans=0.0 2023-12-24 03:51:15,178 INFO [train.py:886] (3/4) Epoch 47, batch 3900, loss[loss=0.01195, audio_tagging_loss=0.01195, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4949382.67 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:51:21,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1487573.3333333333, ans=0.0 2023-12-24 03:51:26,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1487640.0, ans=0.125 2023-12-24 03:51:31,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1487640.0, ans=0.125 2023-12-24 03:51:43,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1487706.6666666667, ans=0.125 2023-12-24 03:51:48,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=15.0 2023-12-24 03:51:49,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-12-24 03:52:06,194 INFO [train.py:886] (3/4) Epoch 47, batch 3950, loss[loss=0.01029, audio_tagging_loss=0.01029, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4951918.95 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:52:19,819 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.576e+01 3.897e+01 4.099e+01 4.303e+01 4.809e+01, threshold=8.198e+01, percent-clipped=0.0 2023-12-24 03:52:24,074 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:52:27,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1488040.0, ans=0.2 2023-12-24 03:52:58,023 INFO [train.py:886] (3/4) Epoch 47, batch 4000, loss[loss=0.01118, audio_tagging_loss=0.01118, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4946206.70 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 32.0 2023-12-24 03:52:58,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1488240.0, ans=0.125 2023-12-24 03:53:11,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-12-24 03:53:15,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1488306.6666666667, ans=0.09899494936611666 2023-12-24 03:53:28,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1488440.0, ans=0.125 2023-12-24 03:53:43,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1488506.6666666667, ans=0.125 2023-12-24 03:53:49,970 INFO [train.py:886] (3/4) Epoch 47, batch 4050, loss[loss=0.0126, audio_tagging_loss=0.0126, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4948337.50 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:53:52,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2023-12-24 03:53:57,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1488573.3333333333, ans=10.0 2023-12-24 03:54:03,598 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.451e+01 4.007e+01 4.158e+01 4.362e+01 4.853e+01, threshold=8.315e+01, percent-clipped=0.0 2023-12-24 03:54:27,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1488773.3333333333, ans=0.125 2023-12-24 03:54:28,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1488773.3333333333, ans=0.125 2023-12-24 03:54:39,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-12-24 03:54:41,995 INFO [train.py:886] (3/4) Epoch 47, batch 4100, loss[loss=0.01063, audio_tagging_loss=0.01063, over 25000.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4944453.51 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:54:42,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1488906.6666666667, ans=0.125 2023-12-24 03:54:56,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1488973.3333333333, ans=0.2 2023-12-24 03:55:18,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.61 vs. limit=15.0 2023-12-24 03:55:24,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.98 vs. limit=15.0 2023-12-24 03:55:33,576 INFO [train.py:886] (3/4) Epoch 47, batch 4150, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4947815.94 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:55:35,716 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 03:55:47,351 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.657e+01 3.988e+01 4.176e+01 4.439e+01 5.267e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 03:55:57,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1489373.3333333333, ans=0.1 2023-12-24 03:56:02,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-12-24 03:56:24,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1489573.3333333333, ans=0.1 2023-12-24 03:56:25,308 INFO [train.py:886] (3/4) Epoch 47, batch 4200, loss[loss=0.009361, audio_tagging_loss=0.009361, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4946459.10 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:56:26,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-12-24 03:56:36,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1489640.0, ans=0.2 2023-12-24 03:56:58,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1489773.3333333333, ans=0.2 2023-12-24 03:57:08,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1489840.0, ans=0.125 2023-12-24 03:57:10,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1489840.0, ans=0.125 2023-12-24 03:57:18,704 INFO [train.py:886] (3/4) Epoch 47, batch 4250, loss[loss=0.01064, audio_tagging_loss=0.01064, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4943233.72 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:57:19,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1489906.6666666667, ans=0.125 2023-12-24 03:57:31,100 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.425e+01 3.978e+01 4.120e+01 4.273e+01 4.787e+01, threshold=8.239e+01, percent-clipped=0.0 2023-12-24 03:57:42,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1490040.0, ans=0.1 2023-12-24 03:57:47,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1490106.6666666667, ans=0.1 2023-12-24 03:57:48,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1490106.6666666667, ans=0.1 2023-12-24 03:57:53,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1490106.6666666667, ans=0.125 2023-12-24 03:58:09,479 INFO [train.py:886] (3/4) Epoch 47, batch 4300, loss[loss=0.009071, audio_tagging_loss=0.009071, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4945425.20 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:58:14,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1490240.0, ans=0.125 2023-12-24 03:58:27,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1490306.6666666667, ans=0.1 2023-12-24 03:58:31,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1490373.3333333333, ans=0.07 2023-12-24 03:58:34,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2023-12-24 03:58:35,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1490373.3333333333, ans=0.2 2023-12-24 03:58:40,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1490440.0, ans=0.125 2023-12-24 03:59:01,786 INFO [train.py:886] (3/4) Epoch 47, batch 4350, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4940129.99 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 03:59:14,772 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 4.048e+01 4.179e+01 4.362e+01 5.083e+01, threshold=8.358e+01, percent-clipped=0.0 2023-12-24 03:59:26,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1490706.6666666667, ans=0.0 2023-12-24 03:59:26,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.82 vs. limit=22.5 2023-12-24 03:59:34,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1490773.3333333333, ans=0.0 2023-12-24 03:59:43,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1490840.0, ans=0.125 2023-12-24 03:59:53,614 INFO [train.py:886] (3/4) Epoch 47, batch 4400, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4938272.10 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:00:00,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1490906.6666666667, ans=0.09899494936611666 2023-12-24 04:00:00,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1490906.6666666667, ans=10.0 2023-12-24 04:00:05,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1490973.3333333333, ans=0.2 2023-12-24 04:00:16,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-12-24 04:00:17,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1491040.0, ans=0.04949747468305833 2023-12-24 04:00:27,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1491106.6666666667, ans=0.125 2023-12-24 04:00:42,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1491173.3333333333, ans=0.125 2023-12-24 04:00:45,169 INFO [train.py:886] (3/4) Epoch 47, batch 4450, loss[loss=0.01217, audio_tagging_loss=0.01217, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4938348.78 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:00:47,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=1491240.0, ans=22.5 2023-12-24 04:00:50,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1491240.0, ans=0.2 2023-12-24 04:00:52,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1491240.0, ans=0.125 2023-12-24 04:00:58,832 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.617e+01 3.970e+01 4.177e+01 4.303e+01 4.832e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:00:59,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1491306.6666666667, ans=0.5 2023-12-24 04:00:59,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1491306.6666666667, ans=0.125 2023-12-24 04:01:03,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-12-24 04:01:08,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1491373.3333333333, ans=0.125 2023-12-24 04:01:11,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1491373.3333333333, ans=0.0 2023-12-24 04:01:17,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1491440.0, ans=0.0 2023-12-24 04:01:28,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2023-12-24 04:01:30,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1491506.6666666667, ans=0.0 2023-12-24 04:01:35,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1491506.6666666667, ans=0.0 2023-12-24 04:01:36,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1491506.6666666667, ans=0.125 2023-12-24 04:01:37,666 INFO [train.py:886] (3/4) Epoch 47, batch 4500, loss[loss=0.01079, audio_tagging_loss=0.01079, over 25000.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4944757.52 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:01:50,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2023-12-24 04:02:15,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1491773.3333333333, ans=0.1 2023-12-24 04:02:30,055 INFO [train.py:886] (3/4) Epoch 47, batch 4550, loss[loss=0.01044, audio_tagging_loss=0.01044, over 24033.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4947242.93 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:02:30,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1491906.6666666667, ans=0.0 2023-12-24 04:02:41,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1491973.3333333333, ans=0.125 2023-12-24 04:02:43,226 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.462e+01 3.976e+01 4.121e+01 4.329e+01 5.064e+01, threshold=8.243e+01, percent-clipped=0.0 2023-12-24 04:02:46,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1491973.3333333333, ans=0.125 2023-12-24 04:02:51,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1492040.0, ans=0.125 2023-12-24 04:02:52,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-12-24 04:02:54,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1492040.0, ans=0.0 2023-12-24 04:03:07,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492106.6666666667, ans=0.1 2023-12-24 04:03:08,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.50 vs. limit=22.5 2023-12-24 04:03:21,234 INFO [train.py:886] (3/4) Epoch 47, batch 4600, loss[loss=0.01135, audio_tagging_loss=0.01135, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4948609.81 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:03:31,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1492306.6666666667, ans=0.0 2023-12-24 04:03:33,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1492306.6666666667, ans=0.125 2023-12-24 04:03:52,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1492440.0, ans=0.125 2023-12-24 04:04:10,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-12-24 04:04:13,654 INFO [train.py:886] (3/4) Epoch 47, batch 4650, loss[loss=0.011, audio_tagging_loss=0.011, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4955933.97 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:04:26,945 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.563e+01 3.987e+01 4.134e+01 4.270e+01 4.832e+01, threshold=8.268e+01, percent-clipped=0.0 2023-12-24 04:04:33,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1492706.6666666667, ans=0.125 2023-12-24 04:04:35,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1492706.6666666667, ans=0.0 2023-12-24 04:04:39,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1492706.6666666667, ans=0.0 2023-12-24 04:04:47,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1492773.3333333333, ans=0.0 2023-12-24 04:04:50,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1492773.3333333333, ans=0.0 2023-12-24 04:04:58,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1492840.0, ans=0.125 2023-12-24 04:05:04,472 INFO [train.py:886] (3/4) Epoch 47, batch 4700, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4958746.09 frames. ], batch size: 100, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:06,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1492906.6666666667, ans=0.1 2023-12-24 04:05:32,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=15.0 2023-12-24 04:05:35,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1493106.6666666667, ans=0.2 2023-12-24 04:05:36,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1493106.6666666667, ans=0.2 2023-12-24 04:05:42,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1493173.3333333333, ans=0.125 2023-12-24 04:05:48,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1493173.3333333333, ans=0.0 2023-12-24 04:05:51,853 INFO [train.py:886] (3/4) Epoch 47, batch 4750, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4956375.87 frames. ], batch size: 99, lr: 2.27e-03, grad_scale: 64.0 2023-12-24 04:05:55,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1493240.0, ans=0.0 2023-12-24 04:05:58,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2023-12-24 04:06:03,896 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 4.050e+01 4.260e+01 4.432e+01 5.167e+01, threshold=8.521e+01, percent-clipped=0.0 2023-12-24 04:06:27,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1493346.6666666667, ans=0.125 2023-12-24 04:06:28,391 INFO [train.py:886] (3/4) Epoch 48, batch 0, loss[loss=0.02358, audio_tagging_loss=0.02358, over 25000.00 frames. ], tot_loss[loss=0.02358, audio_tagging_loss=0.02358, over 25000.00 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:06:28,392 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 04:06:49,459 INFO [train.py:917] (3/4) Epoch 48, validation: loss=0.03686, audio_tagging_loss=0.03686, over 3737520.00 frames. 2023-12-24 04:06:49,460 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 04:07:00,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=12.0 2023-12-24 04:07:07,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1493413.3333333333, ans=0.125 2023-12-24 04:07:40,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1493680.0, ans=0.125 2023-12-24 04:07:41,192 INFO [train.py:886] (3/4) Epoch 48, batch 50, loss[loss=0.01447, audio_tagging_loss=0.01447, over 25000.00 frames. ], tot_loss[loss=0.01704, audio_tagging_loss=0.01704, over 1115916.98 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:07:48,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1493680.0, ans=0.1 2023-12-24 04:08:04,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1493813.3333333333, ans=0.0 2023-12-24 04:08:21,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1493880.0, ans=0.2 2023-12-24 04:08:28,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1493946.6666666667, ans=0.125 2023-12-24 04:08:31,115 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 4.056e+01 4.648e+01 5.127e+01 5.664e+01 9.776e+01, threshold=1.025e+02, percent-clipped=5.0 2023-12-24 04:08:33,023 INFO [train.py:886] (3/4) Epoch 48, batch 100, loss[loss=0.01355, audio_tagging_loss=0.01355, over 25000.00 frames. ], tot_loss[loss=0.01475, audio_tagging_loss=0.01475, over 1974850.24 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:08:45,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1494080.0, ans=0.1 2023-12-24 04:08:46,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-12-24 04:08:52,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-12-24 04:08:53,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1494146.6666666667, ans=0.2 2023-12-24 04:09:09,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-12-24 04:09:11,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-12-24 04:09:16,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1494280.0, ans=0.125 2023-12-24 04:09:24,823 INFO [train.py:886] (3/4) Epoch 48, batch 150, loss[loss=0.01312, audio_tagging_loss=0.01312, over 25000.00 frames. ], tot_loss[loss=0.01368, audio_tagging_loss=0.01368, over 2636652.37 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:09:38,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2023-12-24 04:09:54,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1494546.6666666667, ans=0.2 2023-12-24 04:10:07,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-12-24 04:10:08,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1494613.3333333333, ans=0.125 2023-12-24 04:10:14,265 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.608e+01 4.085e+01 4.287e+01 4.458e+01 4.971e+01, threshold=8.574e+01, percent-clipped=0.0 2023-12-24 04:10:16,194 INFO [train.py:886] (3/4) Epoch 48, batch 200, loss[loss=0.01047, audio_tagging_loss=0.01047, over 25000.00 frames. ], tot_loss[loss=0.01284, audio_tagging_loss=0.01284, over 3152021.88 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:10:18,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-12-24 04:10:22,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1494680.0, ans=0.125 2023-12-24 04:10:43,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-12-24 04:11:08,781 INFO [train.py:886] (3/4) Epoch 48, batch 250, loss[loss=0.01069, audio_tagging_loss=0.01069, over 24939.00 frames. ], tot_loss[loss=0.01227, audio_tagging_loss=0.01227, over 3552540.22 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:11:22,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1495080.0, ans=0.125 2023-12-24 04:11:26,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1495080.0, ans=0.0 2023-12-24 04:11:37,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1495146.6666666667, ans=0.125 2023-12-24 04:11:46,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1495213.3333333333, ans=0.0 2023-12-24 04:11:55,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-12-24 04:11:58,316 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.624e+01 3.958e+01 4.151e+01 4.361e+01 5.160e+01, threshold=8.303e+01, percent-clipped=0.0 2023-12-24 04:12:00,950 INFO [train.py:886] (3/4) Epoch 48, batch 300, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01189, audio_tagging_loss=0.01189, over 3857505.78 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:12:03,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.79 vs. limit=15.0 2023-12-24 04:12:04,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1495346.6666666667, ans=0.125 2023-12-24 04:12:08,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2023-12-24 04:12:14,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1495413.3333333333, ans=0.0 2023-12-24 04:12:14,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1495413.3333333333, ans=0.0 2023-12-24 04:12:22,412 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:12:48,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1495613.3333333333, ans=0.0 2023-12-24 04:12:52,535 INFO [train.py:886] (3/4) Epoch 48, batch 350, loss[loss=0.009594, audio_tagging_loss=0.009594, over 25000.00 frames. ], tot_loss[loss=0.01177, audio_tagging_loss=0.01177, over 4096819.31 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:12:56,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=12.0 2023-12-24 04:13:42,334 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.481e+01 3.929e+01 4.116e+01 4.285e+01 5.525e+01, threshold=8.232e+01, percent-clipped=0.0 2023-12-24 04:13:44,921 INFO [train.py:886] (3/4) Epoch 48, batch 400, loss[loss=0.01078, audio_tagging_loss=0.01078, over 24750.00 frames. ], tot_loss[loss=0.0114, audio_tagging_loss=0.0114, over 4286281.50 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:13:54,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1496080.0, ans=0.0 2023-12-24 04:13:59,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-12-24 04:14:03,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1496146.6666666667, ans=0.0 2023-12-24 04:14:26,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1496280.0, ans=0.2 2023-12-24 04:14:30,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-12-24 04:14:35,632 INFO [train.py:886] (3/4) Epoch 48, batch 450, loss[loss=0.008837, audio_tagging_loss=0.008837, over 23939.00 frames. ], tot_loss[loss=0.01109, audio_tagging_loss=0.01109, over 4437297.96 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:14:35,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1496346.6666666667, ans=0.1 2023-12-24 04:14:52,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1496413.3333333333, ans=0.125 2023-12-24 04:15:19,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1496613.3333333333, ans=0.125 2023-12-24 04:15:23,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-12-24 04:15:26,673 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.400e+01 3.891e+01 4.102e+01 4.306e+01 5.682e+01, threshold=8.203e+01, percent-clipped=0.0 2023-12-24 04:15:28,577 INFO [train.py:886] (3/4) Epoch 48, batch 500, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4558827.60 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:15:31,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1496680.0, ans=0.1 2023-12-24 04:15:50,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1496813.3333333333, ans=0.1 2023-12-24 04:15:54,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1496813.3333333333, ans=0.05 2023-12-24 04:16:02,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1496880.0, ans=0.125 2023-12-24 04:16:05,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1496880.0, ans=0.0 2023-12-24 04:16:05,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1496880.0, ans=0.05 2023-12-24 04:16:11,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1496946.6666666667, ans=0.125 2023-12-24 04:16:19,547 INFO [train.py:886] (3/4) Epoch 48, batch 550, loss[loss=0.01293, audio_tagging_loss=0.01293, over 25000.00 frames. ], tot_loss[loss=0.01101, audio_tagging_loss=0.01101, over 4649769.81 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:16:29,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1497080.0, ans=0.125 2023-12-24 04:16:38,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1497080.0, ans=0.125 2023-12-24 04:16:55,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1497213.3333333333, ans=0.125 2023-12-24 04:17:10,275 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 4.004e+01 4.167e+01 4.372e+01 5.263e+01, threshold=8.335e+01, percent-clipped=0.0 2023-12-24 04:17:12,150 INFO [train.py:886] (3/4) Epoch 48, batch 600, loss[loss=0.01208, audio_tagging_loss=0.01208, over 21602.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4716804.94 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:17:29,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1497413.3333333333, ans=0.125 2023-12-24 04:17:35,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1497480.0, ans=0.0 2023-12-24 04:17:43,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1497546.6666666667, ans=15.0 2023-12-24 04:17:47,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2023-12-24 04:17:51,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1497546.6666666667, ans=0.125 2023-12-24 04:18:03,752 INFO [train.py:886] (3/4) Epoch 48, batch 650, loss[loss=0.01081, audio_tagging_loss=0.01081, over 25000.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4768521.48 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:18:05,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-12-24 04:18:12,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1497680.0, ans=0.125 2023-12-24 04:18:20,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-12-24 04:18:20,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-12-24 04:18:52,642 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.607e+01 3.971e+01 4.176e+01 4.392e+01 5.928e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 04:18:55,243 INFO [train.py:886] (3/4) Epoch 48, batch 700, loss[loss=0.01092, audio_tagging_loss=0.01092, over 24750.00 frames. ], tot_loss[loss=0.01103, audio_tagging_loss=0.01103, over 4806660.62 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:18:56,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-12-24 04:19:10,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1498080.0, ans=0.125 2023-12-24 04:19:20,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1498146.6666666667, ans=0.1 2023-12-24 04:19:20,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-12-24 04:19:29,749 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:19:43,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1498280.0, ans=0.125 2023-12-24 04:19:46,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2023-12-24 04:19:46,790 INFO [train.py:886] (3/4) Epoch 48, batch 750, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4839909.47 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:19:48,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1498346.6666666667, ans=0.125 2023-12-24 04:20:01,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1498413.3333333333, ans=0.0 2023-12-24 04:20:04,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1498413.3333333333, ans=0.125 2023-12-24 04:20:18,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1498546.6666666667, ans=0.0 2023-12-24 04:20:25,846 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:20:35,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1498613.3333333333, ans=0.125 2023-12-24 04:20:36,081 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.930e+01 4.066e+01 4.267e+01 5.066e+01, threshold=8.132e+01, percent-clipped=0.0 2023-12-24 04:20:36,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2023-12-24 04:20:38,017 INFO [train.py:886] (3/4) Epoch 48, batch 800, loss[loss=0.01067, audio_tagging_loss=0.01067, over 25000.00 frames. ], tot_loss[loss=0.01085, audio_tagging_loss=0.01085, over 4866457.76 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:20:49,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1498746.6666666667, ans=0.0 2023-12-24 04:20:52,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1498746.6666666667, ans=0.125 2023-12-24 04:20:56,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-12-24 04:21:02,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1498813.3333333333, ans=10.0 2023-12-24 04:21:04,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1498813.3333333333, ans=0.1 2023-12-24 04:21:06,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1498813.3333333333, ans=0.2 2023-12-24 04:21:10,427 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:21:25,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2023-12-24 04:21:30,304 INFO [train.py:886] (3/4) Epoch 48, batch 850, loss[loss=0.01121, audio_tagging_loss=0.01121, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4883748.30 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:21:36,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1499013.3333333333, ans=0.1 2023-12-24 04:21:50,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1499146.6666666667, ans=0.1 2023-12-24 04:21:56,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1499146.6666666667, ans=0.2 2023-12-24 04:22:02,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-12-24 04:22:20,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.028e+01 4.180e+01 4.366e+01 5.371e+01, threshold=8.359e+01, percent-clipped=0.0 2023-12-24 04:22:22,879 INFO [train.py:886] (3/4) Epoch 48, batch 900, loss[loss=0.01231, audio_tagging_loss=0.01231, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4898082.83 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:22:30,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1499346.6666666667, ans=0.2 2023-12-24 04:22:33,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1499413.3333333333, ans=0.125 2023-12-24 04:22:36,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1499413.3333333333, ans=0.125 2023-12-24 04:22:45,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1499480.0, ans=0.025 2023-12-24 04:22:47,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1499480.0, ans=0.125 2023-12-24 04:22:50,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1499480.0, ans=0.0 2023-12-24 04:22:52,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1499546.6666666667, ans=0.125 2023-12-24 04:22:59,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1499546.6666666667, ans=0.125 2023-12-24 04:23:08,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1499613.3333333333, ans=0.1 2023-12-24 04:23:14,550 INFO [train.py:886] (3/4) Epoch 48, batch 950, loss[loss=0.01277, audio_tagging_loss=0.01277, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4901171.07 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:23:28,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1499746.6666666667, ans=0.125 2023-12-24 04:23:37,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1499813.3333333333, ans=0.2 2023-12-24 04:23:49,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1499880.0, ans=0.0 2023-12-24 04:24:04,742 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.583e+01 3.993e+01 4.147e+01 4.321e+01 5.221e+01, threshold=8.295e+01, percent-clipped=0.0 2023-12-24 04:24:07,310 INFO [train.py:886] (3/4) Epoch 48, batch 1000, loss[loss=0.009808, audio_tagging_loss=0.009808, over 24750.00 frames. ], tot_loss[loss=0.01081, audio_tagging_loss=0.01081, over 4908282.62 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:24:07,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1500013.3333333333, ans=0.025 2023-12-24 04:24:09,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1500013.3333333333, ans=0.0 2023-12-24 04:24:13,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1500013.3333333333, ans=0.125 2023-12-24 04:24:20,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500080.0, ans=0.1 2023-12-24 04:24:38,164 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:24:43,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1500213.3333333333, ans=22.5 2023-12-24 04:24:54,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1500280.0, ans=0.2 2023-12-24 04:24:58,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1500346.6666666667, ans=0.0 2023-12-24 04:24:58,989 INFO [train.py:886] (3/4) Epoch 48, batch 1050, loss[loss=0.00965, audio_tagging_loss=0.00965, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4916549.82 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:00,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1500346.6666666667, ans=0.125 2023-12-24 04:25:04,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1500346.6666666667, ans=0.125 2023-12-24 04:25:11,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-12-24 04:25:36,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500546.6666666667, ans=0.1 2023-12-24 04:25:42,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1500613.3333333333, ans=0.2 2023-12-24 04:25:42,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1500613.3333333333, ans=0.0 2023-12-24 04:25:48,713 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.258e+01 3.950e+01 4.096e+01 4.313e+01 4.903e+01, threshold=8.193e+01, percent-clipped=0.0 2023-12-24 04:25:50,618 INFO [train.py:886] (3/4) Epoch 48, batch 1100, loss[loss=0.008896, audio_tagging_loss=0.008896, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4927609.34 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:25:54,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1500680.0, ans=0.125 2023-12-24 04:26:02,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1500746.6666666667, ans=0.125 2023-12-24 04:26:04,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1500746.6666666667, ans=0.0 2023-12-24 04:26:14,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1500813.3333333333, ans=0.5 2023-12-24 04:26:15,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=12.0 2023-12-24 04:26:21,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1500880.0, ans=0.0 2023-12-24 04:26:23,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1500880.0, ans=10.0 2023-12-24 04:26:30,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1500880.0, ans=0.0 2023-12-24 04:26:31,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500946.6666666667, ans=0.1 2023-12-24 04:26:34,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-12-24 04:26:37,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1500946.6666666667, ans=0.125 2023-12-24 04:26:42,943 INFO [train.py:886] (3/4) Epoch 48, batch 1150, loss[loss=0.009982, audio_tagging_loss=0.009982, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4934476.21 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:26:58,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1501080.0, ans=0.125 2023-12-24 04:27:03,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1501146.6666666667, ans=0.1 2023-12-24 04:27:22,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2023-12-24 04:27:32,815 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.629e+01 3.995e+01 4.171e+01 4.338e+01 4.792e+01, threshold=8.343e+01, percent-clipped=0.0 2023-12-24 04:27:34,745 INFO [train.py:886] (3/4) Epoch 48, batch 1200, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4943184.98 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:27:35,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1501346.6666666667, ans=0.125 2023-12-24 04:28:23,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1501613.3333333333, ans=0.125 2023-12-24 04:28:26,239 INFO [train.py:886] (3/4) Epoch 48, batch 1250, loss[loss=0.01111, audio_tagging_loss=0.01111, over 21959.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4938855.23 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:28:54,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1501813.3333333333, ans=0.125 2023-12-24 04:28:57,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1501880.0, ans=0.0 2023-12-24 04:29:00,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1501880.0, ans=0.1 2023-12-24 04:29:05,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1501880.0, ans=0.125 2023-12-24 04:29:12,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-12-24 04:29:16,861 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.731e+01 4.024e+01 4.196e+01 4.446e+01 5.087e+01, threshold=8.392e+01, percent-clipped=0.0 2023-12-24 04:29:18,762 INFO [train.py:886] (3/4) Epoch 48, batch 1300, loss[loss=0.01337, audio_tagging_loss=0.01337, over 24750.00 frames. ], tot_loss[loss=0.0109, audio_tagging_loss=0.0109, over 4934591.23 frames. ], batch size: 99, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:29:39,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1502146.6666666667, ans=0.125 2023-12-24 04:29:41,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1502146.6666666667, ans=0.125 2023-12-24 04:29:41,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1502146.6666666667, ans=0.125 2023-12-24 04:29:52,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1502213.3333333333, ans=0.0 2023-12-24 04:29:56,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1502213.3333333333, ans=0.0 2023-12-24 04:30:01,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1502280.0, ans=0.0 2023-12-24 04:30:07,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-12-24 04:30:10,927 INFO [train.py:886] (3/4) Epoch 48, batch 1350, loss[loss=0.01156, audio_tagging_loss=0.01156, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4937476.77 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:30:23,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1502413.3333333333, ans=0.1 2023-12-24 04:30:27,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1502413.3333333333, ans=0.04949747468305833 2023-12-24 04:30:45,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1502546.6666666667, ans=0.5 2023-12-24 04:30:59,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1502613.3333333333, ans=0.125 2023-12-24 04:31:00,585 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.929e+01 4.129e+01 4.400e+01 5.128e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:31:02,525 INFO [train.py:886] (3/4) Epoch 48, batch 1400, loss[loss=0.01133, audio_tagging_loss=0.01133, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4940016.86 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:31:16,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1502746.6666666667, ans=0.125 2023-12-24 04:31:25,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1502813.3333333333, ans=0.1 2023-12-24 04:31:36,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-12-24 04:31:46,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-12-24 04:31:49,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1502946.6666666667, ans=0.125 2023-12-24 04:31:54,568 INFO [train.py:886] (3/4) Epoch 48, batch 1450, loss[loss=0.01106, audio_tagging_loss=0.01106, over 25000.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4943213.46 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:04,997 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:32:09,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1503080.0, ans=0.125 2023-12-24 04:32:12,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-12-24 04:32:15,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-12-24 04:32:21,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1503146.6666666667, ans=0.0 2023-12-24 04:32:30,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-12-24 04:32:37,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1503280.0, ans=0.125 2023-12-24 04:32:44,960 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.529e+01 3.948e+01 4.171e+01 4.358e+01 4.772e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:32:46,898 INFO [train.py:886] (3/4) Epoch 48, batch 1500, loss[loss=0.009216, audio_tagging_loss=0.009216, over 25000.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4946069.70 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:32:49,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1503346.6666666667, ans=0.0 2023-12-24 04:32:51,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1503346.6666666667, ans=0.0 2023-12-24 04:33:02,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1503413.3333333333, ans=0.0 2023-12-24 04:33:04,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1503413.3333333333, ans=0.0 2023-12-24 04:33:19,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.11 vs. limit=22.5 2023-12-24 04:33:25,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1503546.6666666667, ans=0.125 2023-12-24 04:33:40,126 INFO [train.py:886] (3/4) Epoch 48, batch 1550, loss[loss=0.01108, audio_tagging_loss=0.01108, over 24970.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4948682.33 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:33:46,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-12-24 04:33:46,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1503680.0, ans=0.2 2023-12-24 04:33:55,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1503746.6666666667, ans=0.125 2023-12-24 04:34:02,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1503813.3333333333, ans=0.07 2023-12-24 04:34:09,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2023-12-24 04:34:09,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1503880.0, ans=0.125 2023-12-24 04:34:23,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1503946.6666666667, ans=0.125 2023-12-24 04:34:23,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-12-24 04:34:29,205 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.717e+01 4.053e+01 4.191e+01 4.372e+01 4.989e+01, threshold=8.382e+01, percent-clipped=0.0 2023-12-24 04:34:31,128 INFO [train.py:886] (3/4) Epoch 48, batch 1600, loss[loss=0.01227, audio_tagging_loss=0.01227, over 22428.00 frames. ], tot_loss[loss=0.01096, audio_tagging_loss=0.01096, over 4943176.89 frames. ], batch size: 107, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:34:39,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1504013.3333333333, ans=0.1 2023-12-24 04:34:39,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1504013.3333333333, ans=0.125 2023-12-24 04:34:40,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1504013.3333333333, ans=0.125 2023-12-24 04:34:40,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1504013.3333333333, ans=0.125 2023-12-24 04:34:41,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-12-24 04:34:44,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1504080.0, ans=0.0 2023-12-24 04:34:51,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1504146.6666666667, ans=0.1 2023-12-24 04:34:53,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-12-24 04:35:10,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1504213.3333333333, ans=0.02 2023-12-24 04:35:11,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1504280.0, ans=0.5 2023-12-24 04:35:22,950 INFO [train.py:886] (3/4) Epoch 48, batch 1650, loss[loss=0.009743, audio_tagging_loss=0.009743, over 25000.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4945013.56 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 32.0 2023-12-24 04:35:23,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1504346.6666666667, ans=0.125 2023-12-24 04:35:26,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1504346.6666666667, ans=0.125 2023-12-24 04:35:35,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1504413.3333333333, ans=15.0 2023-12-24 04:35:38,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1504413.3333333333, ans=0.125 2023-12-24 04:35:46,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1504480.0, ans=0.0 2023-12-24 04:35:51,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2023-12-24 04:36:06,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1504613.3333333333, ans=0.125 2023-12-24 04:36:11,660 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.961e+01 4.120e+01 4.349e+01 5.089e+01, threshold=8.240e+01, percent-clipped=0.0 2023-12-24 04:36:12,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1504680.0, ans=0.125 2023-12-24 04:36:14,266 INFO [train.py:886] (3/4) Epoch 48, batch 1700, loss[loss=0.01317, audio_tagging_loss=0.01317, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4941982.84 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:36:22,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1504680.0, ans=0.125 2023-12-24 04:36:30,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1504746.6666666667, ans=0.2 2023-12-24 04:36:42,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1504813.3333333333, ans=0.0 2023-12-24 04:37:06,799 INFO [train.py:886] (3/4) Epoch 48, batch 1750, loss[loss=0.01236, audio_tagging_loss=0.01236, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4942081.67 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:37:11,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1505013.3333333333, ans=0.1 2023-12-24 04:37:22,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1505080.0, ans=0.0 2023-12-24 04:37:32,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1505146.6666666667, ans=0.0 2023-12-24 04:37:35,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1505146.6666666667, ans=0.2 2023-12-24 04:37:35,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1505146.6666666667, ans=0.0 2023-12-24 04:37:36,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1505213.3333333333, ans=0.0 2023-12-24 04:37:42,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2023-12-24 04:37:46,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-12-24 04:37:48,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1505280.0, ans=0.125 2023-12-24 04:37:56,315 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.626e+01 3.943e+01 4.153e+01 4.308e+01 5.197e+01, threshold=8.305e+01, percent-clipped=0.0 2023-12-24 04:37:59,048 INFO [train.py:886] (3/4) Epoch 48, batch 1800, loss[loss=0.009706, audio_tagging_loss=0.009706, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4950121.11 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:38:14,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1505413.3333333333, ans=0.2 2023-12-24 04:38:16,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2023-12-24 04:38:37,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1505546.6666666667, ans=0.125 2023-12-24 04:38:50,139 INFO [train.py:886] (3/4) Epoch 48, batch 1850, loss[loss=0.01124, audio_tagging_loss=0.01124, over 24750.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4953402.47 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:39:15,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505813.3333333333, ans=0.1 2023-12-24 04:39:20,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1505880.0, ans=0.0 2023-12-24 04:39:40,444 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.037e+01 4.214e+01 4.372e+01 5.067e+01, threshold=8.428e+01, percent-clipped=0.0 2023-12-24 04:39:42,327 INFO [train.py:886] (3/4) Epoch 48, batch 1900, loss[loss=0.009743, audio_tagging_loss=0.009743, over 21589.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4948937.91 frames. ], batch size: 107, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:40:21,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1506213.3333333333, ans=0.2 2023-12-24 04:40:24,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1506280.0, ans=0.1 2023-12-24 04:40:33,857 INFO [train.py:886] (3/4) Epoch 48, batch 1950, loss[loss=0.01146, audio_tagging_loss=0.01146, over 25000.00 frames. ], tot_loss[loss=0.0108, audio_tagging_loss=0.0108, over 4938586.17 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 04:40:42,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1506346.6666666667, ans=0.125 2023-12-24 04:40:58,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-12-24 04:41:01,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-12-24 04:41:19,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1506613.3333333333, ans=0.0 2023-12-24 04:41:24,471 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.628e+01 3.985e+01 4.127e+01 4.367e+01 5.324e+01, threshold=8.253e+01, percent-clipped=0.0 2023-12-24 04:41:26,441 INFO [train.py:886] (3/4) Epoch 48, batch 2000, loss[loss=0.009907, audio_tagging_loss=0.009907, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4936951.05 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:41:45,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1506746.6666666667, ans=0.0 2023-12-24 04:41:51,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1506813.3333333333, ans=0.1 2023-12-24 04:41:56,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2023-12-24 04:42:05,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-12-24 04:42:13,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1506946.6666666667, ans=0.125 2023-12-24 04:42:17,919 INFO [train.py:886] (3/4) Epoch 48, batch 2050, loss[loss=0.012, audio_tagging_loss=0.012, over 25000.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4940575.08 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:42:53,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1507213.3333333333, ans=0.1 2023-12-24 04:42:54,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1507213.3333333333, ans=0.0 2023-12-24 04:43:05,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1507280.0, ans=0.0 2023-12-24 04:43:07,459 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.609e+01 3.972e+01 4.171e+01 4.415e+01 5.113e+01, threshold=8.342e+01, percent-clipped=0.0 2023-12-24 04:43:09,380 INFO [train.py:886] (3/4) Epoch 48, batch 2100, loss[loss=0.01017, audio_tagging_loss=0.01017, over 25000.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4943377.51 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:43:18,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1507346.6666666667, ans=0.125 2023-12-24 04:43:19,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1507413.3333333333, ans=0.015 2023-12-24 04:43:22,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1507413.3333333333, ans=0.2 2023-12-24 04:43:25,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1507413.3333333333, ans=0.125 2023-12-24 04:43:33,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1507480.0, ans=0.125 2023-12-24 04:43:42,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1507546.6666666667, ans=0.2 2023-12-24 04:43:45,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1507546.6666666667, ans=0.2 2023-12-24 04:43:52,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1507613.3333333333, ans=0.125 2023-12-24 04:44:00,790 INFO [train.py:886] (3/4) Epoch 48, batch 2150, loss[loss=0.01173, audio_tagging_loss=0.01173, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4949368.37 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:44:14,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1507746.6666666667, ans=0.07 2023-12-24 04:44:22,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-12-24 04:44:23,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1507813.3333333333, ans=0.1 2023-12-24 04:44:27,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1507813.3333333333, ans=0.125 2023-12-24 04:44:29,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1507813.3333333333, ans=0.125 2023-12-24 04:44:44,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-12-24 04:44:48,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1507946.6666666667, ans=0.95 2023-12-24 04:44:49,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-12-24 04:44:50,296 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.605e+01 4.004e+01 4.221e+01 4.415e+01 5.119e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 04:44:52,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1508013.3333333333, ans=0.125 2023-12-24 04:44:52,941 INFO [train.py:886] (3/4) Epoch 48, batch 2200, loss[loss=0.009577, audio_tagging_loss=0.009577, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4949772.20 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:45:13,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1508146.6666666667, ans=0.0 2023-12-24 04:45:39,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1508280.0, ans=0.125 2023-12-24 04:45:43,765 INFO [train.py:886] (3/4) Epoch 48, batch 2250, loss[loss=0.0095, audio_tagging_loss=0.0095, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4940770.61 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:46:03,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=1508480.0, ans=0.5 2023-12-24 04:46:05,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1508480.0, ans=0.125 2023-12-24 04:46:14,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1508546.6666666667, ans=0.0 2023-12-24 04:46:32,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2023-12-24 04:46:32,555 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.473e+01 4.034e+01 4.155e+01 4.335e+01 5.306e+01, threshold=8.310e+01, percent-clipped=0.0 2023-12-24 04:46:32,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-12-24 04:46:33,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1508680.0, ans=0.125 2023-12-24 04:46:34,445 INFO [train.py:886] (3/4) Epoch 48, batch 2300, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4945823.86 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:46:36,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=12.0 2023-12-24 04:46:39,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1508680.0, ans=0.0 2023-12-24 04:47:07,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-12-24 04:47:23,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1508946.6666666667, ans=0.07 2023-12-24 04:47:25,852 INFO [train.py:886] (3/4) Epoch 48, batch 2350, loss[loss=0.01028, audio_tagging_loss=0.01028, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4945414.63 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:47:51,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1509146.6666666667, ans=0.025 2023-12-24 04:48:00,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1509213.3333333333, ans=0.125 2023-12-24 04:48:08,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1509280.0, ans=0.2 2023-12-24 04:48:15,762 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 3.960e+01 4.074e+01 4.287e+01 4.962e+01, threshold=8.148e+01, percent-clipped=0.0 2023-12-24 04:48:17,735 INFO [train.py:886] (3/4) Epoch 48, batch 2400, loss[loss=0.01045, audio_tagging_loss=0.01045, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4952029.81 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:48:26,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1509346.6666666667, ans=0.125 2023-12-24 04:48:29,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2023-12-24 04:48:35,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1509413.3333333333, ans=0.0 2023-12-24 04:48:40,952 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:48:49,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1509546.6666666667, ans=0.07 2023-12-24 04:49:10,955 INFO [train.py:886] (3/4) Epoch 48, batch 2450, loss[loss=0.01224, audio_tagging_loss=0.01224, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4958842.14 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:49:14,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1509680.0, ans=0.125 2023-12-24 04:49:14,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2023-12-24 04:49:44,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=12.0 2023-12-24 04:50:00,248 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.486e+01 4.002e+01 4.128e+01 4.314e+01 5.407e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 04:50:00,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1509946.6666666667, ans=0.5 2023-12-24 04:50:02,153 INFO [train.py:886] (3/4) Epoch 48, batch 2500, loss[loss=0.01157, audio_tagging_loss=0.01157, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4951784.17 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:50:13,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1510080.0, ans=0.125 2023-12-24 04:50:16,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2023-12-24 04:50:39,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-12-24 04:50:54,094 INFO [train.py:886] (3/4) Epoch 48, batch 2550, loss[loss=0.01117, audio_tagging_loss=0.01117, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4947715.00 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:51:14,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1510480.0, ans=0.0 2023-12-24 04:51:43,839 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.503e+01 4.050e+01 4.211e+01 4.452e+01 5.003e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 04:51:46,458 INFO [train.py:886] (3/4) Epoch 48, batch 2600, loss[loss=0.01095, audio_tagging_loss=0.01095, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4943444.61 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:51:54,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1510680.0, ans=0.125 2023-12-24 04:52:29,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1510946.6666666667, ans=0.1 2023-12-24 04:52:30,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1510946.6666666667, ans=0.0 2023-12-24 04:52:36,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-12-24 04:52:37,980 INFO [train.py:886] (3/4) Epoch 48, batch 2650, loss[loss=0.009335, audio_tagging_loss=0.009335, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4950080.30 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:52:39,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1511013.3333333333, ans=0.0 2023-12-24 04:53:09,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1511213.3333333333, ans=0.0 2023-12-24 04:53:11,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1511213.3333333333, ans=0.07 2023-12-24 04:53:28,178 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.465e+01 3.977e+01 4.124e+01 4.274e+01 5.169e+01, threshold=8.248e+01, percent-clipped=0.0 2023-12-24 04:53:30,068 INFO [train.py:886] (3/4) Epoch 48, batch 2700, loss[loss=0.01039, audio_tagging_loss=0.01039, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4952463.79 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:53:38,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=22.5 2023-12-24 04:53:51,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1511480.0, ans=0.2 2023-12-24 04:54:03,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1511546.6666666667, ans=0.125 2023-12-24 04:54:06,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-12-24 04:54:09,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1511613.3333333333, ans=0.125 2023-12-24 04:54:15,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1511613.3333333333, ans=0.0 2023-12-24 04:54:17,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1511613.3333333333, ans=0.0 2023-12-24 04:54:20,664 INFO [train.py:886] (3/4) Epoch 48, batch 2750, loss[loss=0.01023, audio_tagging_loss=0.01023, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4949890.95 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:54:27,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1511680.0, ans=0.1 2023-12-24 04:54:33,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-12-24 04:54:54,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1511880.0, ans=0.125 2023-12-24 04:55:09,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-12-24 04:55:10,759 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.456e+01 3.953e+01 4.094e+01 4.279e+01 4.852e+01, threshold=8.188e+01, percent-clipped=0.0 2023-12-24 04:55:10,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1511946.6666666667, ans=0.0 2023-12-24 04:55:12,675 INFO [train.py:886] (3/4) Epoch 48, batch 2800, loss[loss=0.01167, audio_tagging_loss=0.01167, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4946923.06 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:55:26,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1512080.0, ans=0.0 2023-12-24 04:55:42,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1512213.3333333333, ans=0.2 2023-12-24 04:55:57,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.64 vs. limit=10.0 2023-12-24 04:56:00,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1512280.0, ans=0.125 2023-12-24 04:56:04,331 INFO [train.py:886] (3/4) Epoch 48, batch 2850, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4947711.55 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:56:05,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1512346.6666666667, ans=0.0 2023-12-24 04:56:06,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1512346.6666666667, ans=0.125 2023-12-24 04:56:29,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1512480.0, ans=0.125 2023-12-24 04:56:53,474 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.637e+01 3.988e+01 4.154e+01 4.396e+01 6.475e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 04:56:55,363 INFO [train.py:886] (3/4) Epoch 48, batch 2900, loss[loss=0.009234, audio_tagging_loss=0.009234, over 24750.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4945734.79 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:56:58,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1512680.0, ans=0.125 2023-12-24 04:57:25,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1512813.3333333333, ans=0.125 2023-12-24 04:57:29,742 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 04:57:36,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-12-24 04:57:47,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-12-24 04:57:47,898 INFO [train.py:886] (3/4) Epoch 48, batch 2950, loss[loss=0.009213, audio_tagging_loss=0.009213, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4946678.77 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:57:49,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1513013.3333333333, ans=0.0 2023-12-24 04:58:17,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1513146.6666666667, ans=0.125 2023-12-24 04:58:36,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1513280.0, ans=0.125 2023-12-24 04:58:37,217 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.638e+01 3.916e+01 4.052e+01 4.286e+01 4.882e+01, threshold=8.104e+01, percent-clipped=0.0 2023-12-24 04:58:39,116 INFO [train.py:886] (3/4) Epoch 48, batch 3000, loss[loss=0.01019, audio_tagging_loss=0.01019, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4948840.16 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 04:58:39,117 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 04:59:00,481 INFO [train.py:917] (3/4) Epoch 48, validation: loss=0.03695, audio_tagging_loss=0.03695, over 3737520.00 frames. 2023-12-24 04:59:00,482 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 04:59:04,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1513346.6666666667, ans=0.125 2023-12-24 04:59:07,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1513346.6666666667, ans=0.125 2023-12-24 04:59:42,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1513613.3333333333, ans=0.015 2023-12-24 04:59:52,926 INFO [train.py:886] (3/4) Epoch 48, batch 3050, loss[loss=0.00789, audio_tagging_loss=0.00789, over 24017.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4951543.21 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:00:16,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1513813.3333333333, ans=0.125 2023-12-24 05:00:22,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1513880.0, ans=0.125 2023-12-24 05:00:23,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-12-24 05:00:27,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1513880.0, ans=0.125 2023-12-24 05:00:28,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1513880.0, ans=0.125 2023-12-24 05:00:41,843 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.634e+01 4.038e+01 4.200e+01 4.350e+01 5.861e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 05:00:44,475 INFO [train.py:886] (3/4) Epoch 48, batch 3100, loss[loss=0.0106, audio_tagging_loss=0.0106, over 24750.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4957422.39 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:00:49,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1514013.3333333333, ans=0.04949747468305833 2023-12-24 05:00:50,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1514013.3333333333, ans=0.0 2023-12-24 05:01:04,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.06 vs. limit=22.5 2023-12-24 05:01:36,063 INFO [train.py:886] (3/4) Epoch 48, batch 3150, loss[loss=0.01429, audio_tagging_loss=0.01429, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4947592.57 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:01:53,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1514413.3333333333, ans=0.0 2023-12-24 05:01:55,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1514413.3333333333, ans=0.125 2023-12-24 05:01:55,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1514413.3333333333, ans=0.1 2023-12-24 05:02:12,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1514546.6666666667, ans=0.2 2023-12-24 05:02:20,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1514613.3333333333, ans=0.025 2023-12-24 05:02:26,562 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.591e+01 4.003e+01 4.208e+01 4.385e+01 5.175e+01, threshold=8.416e+01, percent-clipped=0.0 2023-12-24 05:02:27,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1514680.0, ans=10.0 2023-12-24 05:02:28,499 INFO [train.py:886] (3/4) Epoch 48, batch 3200, loss[loss=0.009557, audio_tagging_loss=0.009557, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4942953.76 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:02:30,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1514680.0, ans=0.125 2023-12-24 05:02:30,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1514680.0, ans=0.125 2023-12-24 05:02:37,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-12-24 05:02:46,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1514746.6666666667, ans=0.125 2023-12-24 05:02:51,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-12-24 05:02:57,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1514813.3333333333, ans=0.0 2023-12-24 05:03:14,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-12-24 05:03:18,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1514946.6666666667, ans=0.125 2023-12-24 05:03:20,325 INFO [train.py:886] (3/4) Epoch 48, batch 3250, loss[loss=0.01187, audio_tagging_loss=0.01187, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4941296.13 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:03:27,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1515013.3333333333, ans=0.125 2023-12-24 05:03:28,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1515013.3333333333, ans=0.125 2023-12-24 05:03:45,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1515146.6666666667, ans=0.0 2023-12-24 05:03:51,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1515213.3333333333, ans=0.0 2023-12-24 05:04:10,064 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.551e+01 3.994e+01 4.173e+01 4.405e+01 4.939e+01, threshold=8.347e+01, percent-clipped=0.0 2023-12-24 05:04:12,029 INFO [train.py:886] (3/4) Epoch 48, batch 3300, loss[loss=0.009797, audio_tagging_loss=0.009797, over 24915.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4948630.41 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 64.0 2023-12-24 05:04:29,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515413.3333333333, ans=0.1 2023-12-24 05:04:35,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1515480.0, ans=0.125 2023-12-24 05:04:38,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-12-24 05:04:44,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1515546.6666666667, ans=0.0 2023-12-24 05:04:49,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1515546.6666666667, ans=0.1 2023-12-24 05:04:50,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1515546.6666666667, ans=0.125 2023-12-24 05:04:57,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1515613.3333333333, ans=0.125 2023-12-24 05:05:04,986 INFO [train.py:886] (3/4) Epoch 48, batch 3350, loss[loss=0.008659, audio_tagging_loss=0.008659, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4952532.82 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:05:07,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1515680.0, ans=0.0 2023-12-24 05:05:31,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1515813.3333333333, ans=0.125 2023-12-24 05:05:33,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1515813.3333333333, ans=0.0 2023-12-24 05:05:47,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=1515946.6666666667, ans=15.0 2023-12-24 05:05:54,534 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.602e+01 3.981e+01 4.138e+01 4.310e+01 5.201e+01, threshold=8.276e+01, percent-clipped=0.0 2023-12-24 05:05:56,179 INFO [train.py:886] (3/4) Epoch 48, batch 3400, loss[loss=0.01177, audio_tagging_loss=0.01177, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4951471.08 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:05:57,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1516013.3333333333, ans=0.0 2023-12-24 05:06:05,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516013.3333333333, ans=0.1 2023-12-24 05:06:16,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1516146.6666666667, ans=0.0 2023-12-24 05:06:23,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1516146.6666666667, ans=0.125 2023-12-24 05:06:37,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1516280.0, ans=0.125 2023-12-24 05:06:47,744 INFO [train.py:886] (3/4) Epoch 48, batch 3450, loss[loss=0.01088, audio_tagging_loss=0.01088, over 25000.00 frames. ], tot_loss[loss=0.01075, audio_tagging_loss=0.01075, over 4949517.02 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:06:48,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1516346.6666666667, ans=0.125 2023-12-24 05:06:52,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1516346.6666666667, ans=0.1 2023-12-24 05:07:07,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1516480.0, ans=0.1 2023-12-24 05:07:20,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-12-24 05:07:38,223 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.417e+01 4.055e+01 4.203e+01 4.420e+01 4.923e+01, threshold=8.407e+01, percent-clipped=0.0 2023-12-24 05:07:39,192 INFO [train.py:886] (3/4) Epoch 48, batch 3500, loss[loss=0.009224, audio_tagging_loss=0.009224, over 24750.00 frames. ], tot_loss[loss=0.01082, audio_tagging_loss=0.01082, over 4947775.52 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:08:20,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1516946.6666666667, ans=0.1 2023-12-24 05:08:30,193 INFO [train.py:886] (3/4) Epoch 48, batch 3550, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4941644.30 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:08:41,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1517080.0, ans=0.125 2023-12-24 05:08:45,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517080.0, ans=0.1 2023-12-24 05:08:59,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.52 vs. limit=12.0 2023-12-24 05:08:59,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517146.6666666667, ans=0.1 2023-12-24 05:09:05,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517213.3333333333, ans=0.1 2023-12-24 05:09:13,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1517280.0, ans=0.0 2023-12-24 05:09:15,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-12-24 05:09:22,407 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.601e+01 3.929e+01 4.166e+01 4.382e+01 5.169e+01, threshold=8.332e+01, percent-clipped=0.0 2023-12-24 05:09:23,402 INFO [train.py:886] (3/4) Epoch 48, batch 3600, loss[loss=0.009701, audio_tagging_loss=0.009701, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4938433.93 frames. ], batch size: 99, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:09:37,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1517413.3333333333, ans=0.1 2023-12-24 05:09:58,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=15.0 2023-12-24 05:10:14,233 INFO [train.py:886] (3/4) Epoch 48, batch 3650, loss[loss=0.01051, audio_tagging_loss=0.01051, over 25000.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4943777.26 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 32.0 2023-12-24 05:10:15,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2023-12-24 05:10:25,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-12-24 05:10:33,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1517746.6666666667, ans=0.125 2023-12-24 05:10:34,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-12-24 05:11:05,043 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.632e+01 4.020e+01 4.210e+01 4.432e+01 5.110e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:11:06,031 INFO [train.py:886] (3/4) Epoch 48, batch 3700, loss[loss=0.01173, audio_tagging_loss=0.01173, over 21677.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4950751.51 frames. ], batch size: 107, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:11:27,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1518146.6666666667, ans=0.0 2023-12-24 05:11:58,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1518346.6666666667, ans=0.125 2023-12-24 05:11:58,843 INFO [train.py:886] (3/4) Epoch 48, batch 3750, loss[loss=0.009792, audio_tagging_loss=0.009792, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4953384.78 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:12:09,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1518413.3333333333, ans=10.0 2023-12-24 05:12:13,401 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:12:21,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1518480.0, ans=0.125 2023-12-24 05:12:42,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1518613.3333333333, ans=0.2 2023-12-24 05:12:45,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1518613.3333333333, ans=0.125 2023-12-24 05:12:48,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2023-12-24 05:12:49,911 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.049e+01 4.266e+01 4.434e+01 5.925e+01, threshold=8.531e+01, percent-clipped=0.0 2023-12-24 05:12:50,908 INFO [train.py:886] (3/4) Epoch 48, batch 3800, loss[loss=0.008923, audio_tagging_loss=0.008923, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4946717.53 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:13:04,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-12-24 05:13:07,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1518746.6666666667, ans=0.0 2023-12-24 05:13:14,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1518813.3333333333, ans=0.07 2023-12-24 05:13:17,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1518813.3333333333, ans=0.125 2023-12-24 05:13:27,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-12-24 05:13:30,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1518880.0, ans=0.2 2023-12-24 05:13:31,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1518946.6666666667, ans=0.125 2023-12-24 05:13:42,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1519013.3333333333, ans=0.125 2023-12-24 05:13:43,505 INFO [train.py:886] (3/4) Epoch 48, batch 3850, loss[loss=0.01037, audio_tagging_loss=0.01037, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4944921.23 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:14:00,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1519080.0, ans=0.2 2023-12-24 05:14:11,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1519146.6666666667, ans=0.1 2023-12-24 05:14:19,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1519213.3333333333, ans=0.125 2023-12-24 05:14:33,479 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.007e+01 4.136e+01 4.360e+01 5.120e+01, threshold=8.272e+01, percent-clipped=0.0 2023-12-24 05:14:35,541 INFO [train.py:886] (3/4) Epoch 48, batch 3900, loss[loss=0.01041, audio_tagging_loss=0.01041, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4945020.07 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:14:35,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1519346.6666666667, ans=0.125 2023-12-24 05:14:40,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2023-12-24 05:14:47,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1519413.3333333333, ans=15.0 2023-12-24 05:14:52,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1519413.3333333333, ans=0.2 2023-12-24 05:14:57,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1519480.0, ans=0.125 2023-12-24 05:15:24,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1519613.3333333333, ans=0.125 2023-12-24 05:15:26,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1519680.0, ans=0.0 2023-12-24 05:15:27,383 INFO [train.py:886] (3/4) Epoch 48, batch 3950, loss[loss=0.01075, audio_tagging_loss=0.01075, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4953228.59 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:15:29,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1519680.0, ans=0.2 2023-12-24 05:15:44,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.19 vs. limit=10.0 2023-12-24 05:15:47,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1519813.3333333333, ans=0.125 2023-12-24 05:15:53,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1519813.3333333333, ans=0.0 2023-12-24 05:16:00,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2023-12-24 05:16:10,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1519946.6666666667, ans=0.2 2023-12-24 05:16:20,885 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.458e+01 4.031e+01 4.185e+01 4.396e+01 5.576e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 05:16:21,851 INFO [train.py:886] (3/4) Epoch 48, batch 4000, loss[loss=0.009969, audio_tagging_loss=0.009969, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4951352.04 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:16:46,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-12-24 05:17:03,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1520280.0, ans=0.0 2023-12-24 05:17:13,345 INFO [train.py:886] (3/4) Epoch 48, batch 4050, loss[loss=0.01164, audio_tagging_loss=0.01164, over 25000.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4947214.18 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:17:28,132 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:17:30,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1520413.3333333333, ans=0.1 2023-12-24 05:17:34,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-12-24 05:18:05,528 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.431e+01 4.000e+01 4.154e+01 4.324e+01 5.032e+01, threshold=8.309e+01, percent-clipped=0.0 2023-12-24 05:18:06,507 INFO [train.py:886] (3/4) Epoch 48, batch 4100, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4948721.97 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:18:15,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1520746.6666666667, ans=0.125 2023-12-24 05:18:17,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1520746.6666666667, ans=0.0 2023-12-24 05:18:27,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1520813.3333333333, ans=0.125 2023-12-24 05:18:56,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=10.0 2023-12-24 05:18:58,350 INFO [train.py:886] (3/4) Epoch 48, batch 4150, loss[loss=0.009396, audio_tagging_loss=0.009396, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4941871.88 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:19:05,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1521013.3333333333, ans=0.125 2023-12-24 05:19:42,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1521280.0, ans=0.125 2023-12-24 05:19:49,399 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.024e+01 4.153e+01 4.360e+01 4.986e+01, threshold=8.306e+01, percent-clipped=0.0 2023-12-24 05:19:50,400 INFO [train.py:886] (3/4) Epoch 48, batch 4200, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4944928.42 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:19:58,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1521346.6666666667, ans=0.1 2023-12-24 05:20:00,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2023-12-24 05:20:19,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.56 vs. limit=15.0 2023-12-24 05:20:43,074 INFO [train.py:886] (3/4) Epoch 48, batch 4250, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4951270.32 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:20:48,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1521680.0, ans=0.0 2023-12-24 05:20:55,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1521746.6666666667, ans=0.125 2023-12-24 05:21:04,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1521813.3333333333, ans=0.1 2023-12-24 05:21:10,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1521813.3333333333, ans=0.2 2023-12-24 05:21:33,915 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.572e+01 3.984e+01 4.154e+01 4.315e+01 4.731e+01, threshold=8.307e+01, percent-clipped=0.0 2023-12-24 05:21:34,927 INFO [train.py:886] (3/4) Epoch 48, batch 4300, loss[loss=0.008206, audio_tagging_loss=0.008206, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4952352.06 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:22:15,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1522280.0, ans=0.09899494936611666 2023-12-24 05:22:25,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-12-24 05:22:26,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1522346.6666666667, ans=0.125 2023-12-24 05:22:26,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1522346.6666666667, ans=0.5 2023-12-24 05:22:26,759 INFO [train.py:886] (3/4) Epoch 48, batch 4350, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4955780.45 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:22:35,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1522346.6666666667, ans=0.0 2023-12-24 05:22:40,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.22 vs. limit=10.0 2023-12-24 05:22:44,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1522413.3333333333, ans=0.125 2023-12-24 05:22:48,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1522480.0, ans=0.0 2023-12-24 05:22:55,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1522480.0, ans=0.0 2023-12-24 05:22:55,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2023-12-24 05:23:01,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1522546.6666666667, ans=0.125 2023-12-24 05:23:18,151 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.751e+01 4.021e+01 4.163e+01 4.358e+01 4.850e+01, threshold=8.326e+01, percent-clipped=0.0 2023-12-24 05:23:19,138 INFO [train.py:886] (3/4) Epoch 48, batch 4400, loss[loss=0.01013, audio_tagging_loss=0.01013, over 25000.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4952669.61 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:23:20,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1522680.0, ans=0.125 2023-12-24 05:23:20,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2023-12-24 05:23:22,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1522680.0, ans=0.0 2023-12-24 05:23:40,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1522813.3333333333, ans=0.5 2023-12-24 05:24:03,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1522946.6666666667, ans=0.125 2023-12-24 05:24:04,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1522946.6666666667, ans=0.125 2023-12-24 05:24:07,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1522946.6666666667, ans=0.125 2023-12-24 05:24:10,765 INFO [train.py:886] (3/4) Epoch 48, batch 4450, loss[loss=0.01212, audio_tagging_loss=0.01212, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4948206.57 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:24:34,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1523146.6666666667, ans=0.2 2023-12-24 05:24:46,564 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:25:01,879 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 4.053e+01 4.196e+01 4.337e+01 4.887e+01, threshold=8.393e+01, percent-clipped=0.0 2023-12-24 05:25:02,869 INFO [train.py:886] (3/4) Epoch 48, batch 4500, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24051.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4945731.58 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:25:15,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1523413.3333333333, ans=0.04949747468305833 2023-12-24 05:25:17,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1523413.3333333333, ans=0.125 2023-12-24 05:25:17,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1523413.3333333333, ans=0.1 2023-12-24 05:25:22,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1523480.0, ans=0.0 2023-12-24 05:25:27,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1523480.0, ans=0.125 2023-12-24 05:25:38,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1523546.6666666667, ans=0.125 2023-12-24 05:25:39,829 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:25:45,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-12-24 05:25:53,556 INFO [train.py:886] (3/4) Epoch 48, batch 4550, loss[loss=0.01217, audio_tagging_loss=0.01217, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4948827.85 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:11,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1523746.6666666667, ans=0.1 2023-12-24 05:26:20,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1523813.3333333333, ans=0.0 2023-12-24 05:26:23,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=12.0 2023-12-24 05:26:33,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1523880.0, ans=0.0 2023-12-24 05:26:40,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.22 vs. limit=10.0 2023-12-24 05:26:44,556 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.666e+01 4.022e+01 4.224e+01 4.415e+01 4.872e+01, threshold=8.448e+01, percent-clipped=0.0 2023-12-24 05:26:45,539 INFO [train.py:886] (3/4) Epoch 48, batch 4600, loss[loss=0.009639, audio_tagging_loss=0.009639, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4955203.02 frames. ], batch size: 100, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:26:51,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1524013.3333333333, ans=0.2 2023-12-24 05:26:57,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1524080.0, ans=0.07 2023-12-24 05:27:15,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1524213.3333333333, ans=0.025 2023-12-24 05:27:23,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1524213.3333333333, ans=0.125 2023-12-24 05:27:31,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1524280.0, ans=0.0 2023-12-24 05:27:38,195 INFO [train.py:886] (3/4) Epoch 48, batch 4650, loss[loss=0.01086, audio_tagging_loss=0.01086, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4951994.17 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:27:43,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1524346.6666666667, ans=0.0 2023-12-24 05:28:26,937 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.730e+01 4.056e+01 4.273e+01 4.476e+01 5.742e+01, threshold=8.545e+01, percent-clipped=0.0 2023-12-24 05:28:27,899 INFO [train.py:886] (3/4) Epoch 48, batch 4700, loss[loss=0.01117, audio_tagging_loss=0.01117, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4945349.11 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:28:28,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1524680.0, ans=0.0 2023-12-24 05:28:35,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-12-24 05:28:46,068 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:28:57,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1524880.0, ans=0.1 2023-12-24 05:28:57,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=12.0 2023-12-24 05:29:10,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1524946.6666666667, ans=0.0 2023-12-24 05:29:15,574 INFO [train.py:886] (3/4) Epoch 48, batch 4750, loss[loss=0.01046, audio_tagging_loss=0.01046, over 24750.00 frames. ], tot_loss[loss=0.01077, audio_tagging_loss=0.01077, over 4947225.36 frames. ], batch size: 99, lr: 2.22e-03, grad_scale: 32.0 2023-12-24 05:29:50,919 INFO [train.py:886] (3/4) Epoch 49, batch 0, loss[loss=0.02545, audio_tagging_loss=0.02545, over 25000.00 frames. ], tot_loss[loss=0.02545, audio_tagging_loss=0.02545, over 25000.00 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 32.0 2023-12-24 05:29:50,923 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 05:30:12,073 INFO [train.py:917] (3/4) Epoch 49, validation: loss=0.03671, audio_tagging_loss=0.03671, over 3737520.00 frames. 2023-12-24 05:30:12,074 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 05:30:20,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1525186.6666666667, ans=0.0 2023-12-24 05:30:23,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1525186.6666666667, ans=0.125 2023-12-24 05:30:23,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1525186.6666666667, ans=0.04949747468305833 2023-12-24 05:30:24,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1525186.6666666667, ans=0.125 2023-12-24 05:30:33,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2023-12-24 05:30:44,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1525320.0, ans=0.1 2023-12-24 05:30:47,304 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.166e+01 4.419e+01 5.711e+01 1.124e+02, threshold=8.838e+01, percent-clipped=6.0 2023-12-24 05:31:00,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1525386.6666666667, ans=0.0 2023-12-24 05:31:03,268 INFO [train.py:886] (3/4) Epoch 49, batch 50, loss[loss=0.01629, audio_tagging_loss=0.01629, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 1119890.35 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:11,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-12-24 05:31:23,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1525586.6666666667, ans=0.2 2023-12-24 05:31:23,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1525586.6666666667, ans=0.05 2023-12-24 05:31:54,141 INFO [train.py:886] (3/4) Epoch 49, batch 100, loss[loss=0.01213, audio_tagging_loss=0.01213, over 25000.00 frames. ], tot_loss[loss=0.01467, audio_tagging_loss=0.01467, over 1971381.84 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:31:54,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1525786.6666666667, ans=0.125 2023-12-24 05:32:02,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=12.0 2023-12-24 05:32:07,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1525853.3333333333, ans=0.0 2023-12-24 05:32:08,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-12-24 05:32:11,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1525853.3333333333, ans=0.125 2023-12-24 05:32:18,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.62 vs. limit=22.5 2023-12-24 05:32:28,984 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.922e+01 4.393e+01 4.587e+01 4.980e+01 5.717e+01, threshold=9.174e+01, percent-clipped=0.0 2023-12-24 05:32:30,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-12-24 05:32:44,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1526120.0, ans=0.125 2023-12-24 05:32:44,819 INFO [train.py:886] (3/4) Epoch 49, batch 150, loss[loss=0.009032, audio_tagging_loss=0.009032, over 24750.00 frames. ], tot_loss[loss=0.01353, audio_tagging_loss=0.01353, over 2632216.93 frames. ], batch size: 99, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:32:45,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1526120.0, ans=0.125 2023-12-24 05:32:55,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-24 05:33:37,106 INFO [train.py:886] (3/4) Epoch 49, batch 200, loss[loss=0.00951, audio_tagging_loss=0.00951, over 25000.00 frames. ], tot_loss[loss=0.01263, audio_tagging_loss=0.01263, over 3149556.15 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:33:54,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=2.41 vs. limit=5.0 2023-12-24 05:34:04,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1526586.6666666667, ans=0.125 2023-12-24 05:34:12,028 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.501e+01 4.050e+01 4.227e+01 4.381e+01 4.955e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 05:34:15,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1526653.3333333333, ans=0.125 2023-12-24 05:34:25,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1526720.0, ans=0.125 2023-12-24 05:34:28,161 INFO [train.py:886] (3/4) Epoch 49, batch 250, loss[loss=0.009881, audio_tagging_loss=0.009881, over 25000.00 frames. ], tot_loss[loss=0.01215, audio_tagging_loss=0.01215, over 3552555.76 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:34:32,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2023-12-24 05:34:41,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1526853.3333333333, ans=0.07 2023-12-24 05:35:14,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1527053.3333333333, ans=0.0 2023-12-24 05:35:15,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1527053.3333333333, ans=0.2 2023-12-24 05:35:19,236 INFO [train.py:886] (3/4) Epoch 49, batch 300, loss[loss=0.01174, audio_tagging_loss=0.01174, over 24750.00 frames. ], tot_loss[loss=0.01188, audio_tagging_loss=0.01188, over 3864783.60 frames. ], batch size: 99, lr: 2.20e-03, grad_scale: 16.0 2023-12-24 05:35:36,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1527186.6666666667, ans=0.125 2023-12-24 05:35:53,616 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.671e+01 4.036e+01 4.211e+01 4.378e+01 5.277e+01, threshold=8.421e+01, percent-clipped=0.0 2023-12-24 05:36:05,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1527386.6666666667, ans=0.125 2023-12-24 05:36:10,939 INFO [train.py:886] (3/4) Epoch 49, batch 350, loss[loss=0.01025, audio_tagging_loss=0.01025, over 24750.00 frames. ], tot_loss[loss=0.01165, audio_tagging_loss=0.01165, over 4101404.82 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 16.0 2023-12-24 05:36:12,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1527453.3333333333, ans=0.125 2023-12-24 05:36:23,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1527520.0, ans=0.125 2023-12-24 05:36:45,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1527653.3333333333, ans=0.015 2023-12-24 05:36:55,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1527720.0, ans=0.0 2023-12-24 05:37:00,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1527786.6666666667, ans=0.0 2023-12-24 05:37:01,188 INFO [train.py:886] (3/4) Epoch 49, batch 400, loss[loss=0.008538, audio_tagging_loss=0.008538, over 24750.00 frames. ], tot_loss[loss=0.01135, audio_tagging_loss=0.01135, over 4290172.89 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:37:18,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2023-12-24 05:37:24,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1527920.0, ans=0.125 2023-12-24 05:37:24,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1527920.0, ans=0.0 2023-12-24 05:37:36,419 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.568e+01 3.880e+01 4.116e+01 4.327e+01 4.784e+01, threshold=8.231e+01, percent-clipped=0.0 2023-12-24 05:37:40,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1527986.6666666667, ans=0.125 2023-12-24 05:37:53,111 INFO [train.py:886] (3/4) Epoch 49, batch 450, loss[loss=0.008611, audio_tagging_loss=0.008611, over 24750.00 frames. ], tot_loss[loss=0.01108, audio_tagging_loss=0.01108, over 4435861.62 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:37:53,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1528120.0, ans=0.0 2023-12-24 05:38:04,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1528186.6666666667, ans=0.125 2023-12-24 05:38:11,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1528186.6666666667, ans=0.125 2023-12-24 05:38:15,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1528253.3333333333, ans=0.2 2023-12-24 05:38:18,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1528253.3333333333, ans=0.95 2023-12-24 05:38:30,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-24 05:38:35,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1528386.6666666667, ans=0.0 2023-12-24 05:38:37,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1528386.6666666667, ans=0.125 2023-12-24 05:38:39,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1528386.6666666667, ans=0.125 2023-12-24 05:38:42,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-12-24 05:38:44,055 INFO [train.py:886] (3/4) Epoch 49, batch 500, loss[loss=0.01251, audio_tagging_loss=0.01251, over 25000.00 frames. ], tot_loss[loss=0.01093, audio_tagging_loss=0.01093, over 4551891.74 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:39:08,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1528586.6666666667, ans=0.125 2023-12-24 05:39:08,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1528586.6666666667, ans=0.125 2023-12-24 05:39:18,318 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.604e+01 3.968e+01 4.140e+01 4.300e+01 5.100e+01, threshold=8.280e+01, percent-clipped=0.0 2023-12-24 05:39:19,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1528653.3333333333, ans=0.125 2023-12-24 05:39:23,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1528720.0, ans=0.1 2023-12-24 05:39:32,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1528720.0, ans=0.125 2023-12-24 05:39:34,825 INFO [train.py:886] (3/4) Epoch 49, batch 550, loss[loss=0.008233, audio_tagging_loss=0.008233, over 23992.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4643326.73 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:39:35,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1528786.6666666667, ans=0.2 2023-12-24 05:39:39,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1528786.6666666667, ans=0.125 2023-12-24 05:40:01,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1528920.0, ans=0.1 2023-12-24 05:40:25,953 INFO [train.py:886] (3/4) Epoch 49, batch 600, loss[loss=0.01063, audio_tagging_loss=0.01063, over 24952.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4711494.21 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:40:40,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1529186.6666666667, ans=0.0 2023-12-24 05:40:59,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1529320.0, ans=0.1 2023-12-24 05:41:01,071 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.737e+01 4.021e+01 4.199e+01 4.428e+01 6.437e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:41:06,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1529386.6666666667, ans=0.0 2023-12-24 05:41:07,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1529386.6666666667, ans=0.125 2023-12-24 05:41:18,326 INFO [train.py:886] (3/4) Epoch 49, batch 650, loss[loss=0.01208, audio_tagging_loss=0.01208, over 24750.00 frames. ], tot_loss[loss=0.01099, audio_tagging_loss=0.01099, over 4763730.34 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:41:22,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1529453.3333333333, ans=0.125 2023-12-24 05:41:43,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1529586.6666666667, ans=0.125 2023-12-24 05:41:52,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1529653.3333333333, ans=0.0 2023-12-24 05:41:56,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529653.3333333333, ans=0.125 2023-12-24 05:42:04,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1529720.0, ans=0.125 2023-12-24 05:42:09,759 INFO [train.py:886] (3/4) Epoch 49, batch 700, loss[loss=0.0117, audio_tagging_loss=0.0117, over 25000.00 frames. ], tot_loss[loss=0.01091, audio_tagging_loss=0.01091, over 4800304.06 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:42:16,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1529786.6666666667, ans=0.0 2023-12-24 05:42:17,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-12-24 05:42:28,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.37 vs. limit=6.0 2023-12-24 05:42:31,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1529920.0, ans=0.1 2023-12-24 05:42:34,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=22.5 2023-12-24 05:42:39,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2023-12-24 05:42:44,607 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 3.993e+01 4.206e+01 4.464e+01 5.068e+01, threshold=8.413e+01, percent-clipped=0.0 2023-12-24 05:42:59,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1530120.0, ans=0.2 2023-12-24 05:43:00,361 INFO [train.py:886] (3/4) Epoch 49, batch 750, loss[loss=0.01096, audio_tagging_loss=0.01096, over 24750.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4833542.83 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:43:05,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1530120.0, ans=0.1 2023-12-24 05:43:12,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1530186.6666666667, ans=0.1 2023-12-24 05:43:31,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1530320.0, ans=0.0 2023-12-24 05:43:37,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.54 vs. limit=22.5 2023-12-24 05:43:46,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1530386.6666666667, ans=0.125 2023-12-24 05:43:52,142 INFO [train.py:886] (3/4) Epoch 49, batch 800, loss[loss=0.009138, audio_tagging_loss=0.009138, over 25000.00 frames. ], tot_loss[loss=0.01074, audio_tagging_loss=0.01074, over 4862381.23 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:44:04,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1530520.0, ans=0.125 2023-12-24 05:44:13,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-12-24 05:44:27,100 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.579e+01 3.966e+01 4.159e+01 4.355e+01 5.694e+01, threshold=8.318e+01, percent-clipped=0.0 2023-12-24 05:44:41,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1530720.0, ans=0.04949747468305833 2023-12-24 05:44:41,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1530720.0, ans=0.125 2023-12-24 05:44:41,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1530720.0, ans=0.0 2023-12-24 05:44:43,592 INFO [train.py:886] (3/4) Epoch 49, batch 850, loss[loss=0.008153, audio_tagging_loss=0.008153, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4884784.78 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:44:43,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-12-24 05:45:11,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-12-24 05:45:23,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1531053.3333333333, ans=0.125 2023-12-24 05:45:34,369 INFO [train.py:886] (3/4) Epoch 49, batch 900, loss[loss=0.007834, audio_tagging_loss=0.007834, over 22013.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4895870.14 frames. ], batch size: 107, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:45:43,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1531186.6666666667, ans=0.125 2023-12-24 05:45:56,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1531253.3333333333, ans=0.0 2023-12-24 05:46:08,729 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.050e+01 4.200e+01 4.404e+01 5.257e+01, threshold=8.399e+01, percent-clipped=0.0 2023-12-24 05:46:11,812 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:46:15,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1531386.6666666667, ans=0.125 2023-12-24 05:46:24,524 INFO [train.py:886] (3/4) Epoch 49, batch 950, loss[loss=0.01003, audio_tagging_loss=0.01003, over 24750.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4895982.54 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:46:24,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1531453.3333333333, ans=0.125 2023-12-24 05:46:24,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1531453.3333333333, ans=0.0 2023-12-24 05:46:27,586 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:46:50,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1531586.6666666667, ans=0.125 2023-12-24 05:46:51,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1531586.6666666667, ans=0.05 2023-12-24 05:46:57,176 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 05:46:58,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2023-12-24 05:47:01,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1531653.3333333333, ans=0.125 2023-12-24 05:47:05,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-12-24 05:47:16,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-12-24 05:47:17,015 INFO [train.py:886] (3/4) Epoch 49, batch 1000, loss[loss=0.009664, audio_tagging_loss=0.009664, over 24023.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4905205.45 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:47:17,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1531786.6666666667, ans=0.0 2023-12-24 05:47:18,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1531786.6666666667, ans=0.125 2023-12-24 05:47:26,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1531853.3333333333, ans=0.125 2023-12-24 05:47:31,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1531853.3333333333, ans=0.1 2023-12-24 05:47:42,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-12-24 05:47:50,839 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.669e+01 3.990e+01 4.143e+01 4.294e+01 8.017e+01, threshold=8.286e+01, percent-clipped=0.0 2023-12-24 05:47:54,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-12-24 05:47:55,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1532053.3333333333, ans=0.125 2023-12-24 05:48:03,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-12-24 05:48:06,657 INFO [train.py:886] (3/4) Epoch 49, batch 1050, loss[loss=0.01026, audio_tagging_loss=0.01026, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4915936.70 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:48:35,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1532320.0, ans=0.0 2023-12-24 05:48:46,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-12-24 05:48:49,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1532386.6666666667, ans=0.125 2023-12-24 05:48:58,205 INFO [train.py:886] (3/4) Epoch 49, batch 1100, loss[loss=0.009298, audio_tagging_loss=0.009298, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4922990.56 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:49:03,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1532453.3333333333, ans=0.0 2023-12-24 05:49:25,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1532586.6666666667, ans=0.125 2023-12-24 05:49:29,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1532653.3333333333, ans=0.0 2023-12-24 05:49:33,436 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 3.973e+01 4.201e+01 4.376e+01 5.038e+01, threshold=8.402e+01, percent-clipped=0.0 2023-12-24 05:49:34,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1532653.3333333333, ans=0.125 2023-12-24 05:49:40,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2023-12-24 05:49:43,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1532720.0, ans=0.125 2023-12-24 05:49:49,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.20 vs. limit=10.0 2023-12-24 05:49:50,824 INFO [train.py:886] (3/4) Epoch 49, batch 1150, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4927153.27 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:50:02,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-12-24 05:50:22,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2023-12-24 05:50:29,551 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-12-24 05:50:30,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1532986.6666666667, ans=0.0 2023-12-24 05:50:42,341 INFO [train.py:886] (3/4) Epoch 49, batch 1200, loss[loss=0.01154, audio_tagging_loss=0.01154, over 24750.00 frames. ], tot_loss[loss=0.01067, audio_tagging_loss=0.01067, over 4939633.26 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:50:55,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2023-12-24 05:51:07,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1533253.3333333333, ans=0.125 2023-12-24 05:51:08,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1533253.3333333333, ans=0.125 2023-12-24 05:51:10,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1533253.3333333333, ans=10.0 2023-12-24 05:51:16,886 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.561e+01 4.054e+01 4.177e+01 4.431e+01 4.907e+01, threshold=8.353e+01, percent-clipped=0.0 2023-12-24 05:51:34,947 INFO [train.py:886] (3/4) Epoch 49, batch 1250, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.01084, audio_tagging_loss=0.01084, over 4944223.66 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:51:44,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-12-24 05:51:47,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1533520.0, ans=0.0 2023-12-24 05:51:49,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-12-24 05:51:51,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-12-24 05:51:54,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1533586.6666666667, ans=0.02 2023-12-24 05:52:03,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1533586.6666666667, ans=0.025 2023-12-24 05:52:14,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1533720.0, ans=0.0 2023-12-24 05:52:15,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-12-24 05:52:26,485 INFO [train.py:886] (3/4) Epoch 49, batch 1300, loss[loss=0.009389, audio_tagging_loss=0.009389, over 24750.00 frames. ], tot_loss[loss=0.01088, audio_tagging_loss=0.01088, over 4938614.37 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:52:35,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2023-12-24 05:52:35,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1533786.6666666667, ans=0.125 2023-12-24 05:52:50,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1533920.0, ans=0.125 2023-12-24 05:52:53,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1533920.0, ans=0.07 2023-12-24 05:52:56,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1533986.6666666667, ans=0.05 2023-12-24 05:53:01,777 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.636e+01 4.042e+01 4.229e+01 4.423e+01 4.890e+01, threshold=8.459e+01, percent-clipped=0.0 2023-12-24 05:53:02,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=12.0 2023-12-24 05:53:06,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-12-24 05:53:13,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1534053.3333333333, ans=0.2 2023-12-24 05:53:18,434 INFO [train.py:886] (3/4) Epoch 49, batch 1350, loss[loss=0.01303, audio_tagging_loss=0.01303, over 25000.00 frames. ], tot_loss[loss=0.01078, audio_tagging_loss=0.01078, over 4938180.83 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:53:44,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1534253.3333333333, ans=0.125 2023-12-24 05:53:50,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1534320.0, ans=0.125 2023-12-24 05:54:05,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1534386.6666666667, ans=15.0 2023-12-24 05:54:10,947 INFO [train.py:886] (3/4) Epoch 49, batch 1400, loss[loss=0.009704, audio_tagging_loss=0.009704, over 24750.00 frames. ], tot_loss[loss=0.01076, audio_tagging_loss=0.01076, over 4937122.35 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:54:14,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1534453.3333333333, ans=0.125 2023-12-24 05:54:15,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-12-24 05:54:16,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1534453.3333333333, ans=0.125 2023-12-24 05:54:41,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1534653.3333333333, ans=0.125 2023-12-24 05:54:45,413 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.570e+01 3.938e+01 4.141e+01 4.265e+01 4.875e+01, threshold=8.281e+01, percent-clipped=0.0 2023-12-24 05:54:50,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1534720.0, ans=0.125 2023-12-24 05:54:54,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-12-24 05:55:00,593 INFO [train.py:886] (3/4) Epoch 49, batch 1450, loss[loss=0.009468, audio_tagging_loss=0.009468, over 22203.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4938146.29 frames. ], batch size: 107, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:55:00,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1534786.6666666667, ans=0.1 2023-12-24 05:55:05,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1534786.6666666667, ans=0.125 2023-12-24 05:55:18,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1534853.3333333333, ans=0.125 2023-12-24 05:55:25,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1534920.0, ans=0.125 2023-12-24 05:55:35,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1534986.6666666667, ans=0.0 2023-12-24 05:55:53,552 INFO [train.py:886] (3/4) Epoch 49, batch 1500, loss[loss=0.01068, audio_tagging_loss=0.01068, over 25000.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4941198.02 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:56:00,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1535120.0, ans=0.04949747468305833 2023-12-24 05:56:00,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-12-24 05:56:04,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1535186.6666666667, ans=0.0 2023-12-24 05:56:07,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1535186.6666666667, ans=0.015 2023-12-24 05:56:25,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1535320.0, ans=0.125 2023-12-24 05:56:28,576 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.582e+01 4.106e+01 4.257e+01 4.421e+01 5.889e+01, threshold=8.514e+01, percent-clipped=0.0 2023-12-24 05:56:35,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1535386.6666666667, ans=0.0 2023-12-24 05:56:45,301 INFO [train.py:886] (3/4) Epoch 49, batch 1550, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4936218.14 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:57:17,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-12-24 05:57:28,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=12.0 2023-12-24 05:57:32,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1535720.0, ans=0.0 2023-12-24 05:57:33,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-12-24 05:57:36,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1535786.6666666667, ans=0.125 2023-12-24 05:57:37,152 INFO [train.py:886] (3/4) Epoch 49, batch 1600, loss[loss=0.01096, audio_tagging_loss=0.01096, over 25000.00 frames. ], tot_loss[loss=0.01079, audio_tagging_loss=0.01079, over 4928866.48 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:57:38,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1535786.6666666667, ans=0.125 2023-12-24 05:57:45,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1535786.6666666667, ans=0.2 2023-12-24 05:57:53,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1535853.3333333333, ans=0.0 2023-12-24 05:58:12,889 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.640e+01 4.058e+01 4.221e+01 4.400e+01 5.973e+01, threshold=8.442e+01, percent-clipped=0.0 2023-12-24 05:58:30,232 INFO [train.py:886] (3/4) Epoch 49, batch 1650, loss[loss=0.01367, audio_tagging_loss=0.01367, over 25000.00 frames. ], tot_loss[loss=0.01069, audio_tagging_loss=0.01069, over 4927334.60 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:58:34,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1536120.0, ans=0.1 2023-12-24 05:58:38,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2023-12-24 05:58:45,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1536186.6666666667, ans=0.125 2023-12-24 05:58:55,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2023-12-24 05:59:03,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1536320.0, ans=0.1 2023-12-24 05:59:04,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1536320.0, ans=0.1 2023-12-24 05:59:13,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-12-24 05:59:21,933 INFO [train.py:886] (3/4) Epoch 49, batch 1700, loss[loss=0.01185, audio_tagging_loss=0.01185, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4934049.26 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 05:59:25,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-12-24 05:59:28,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2023-12-24 05:59:29,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.80 vs. limit=10.0 2023-12-24 05:59:56,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1536653.3333333333, ans=0.2 2023-12-24 05:59:56,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1536653.3333333333, ans=0.0 2023-12-24 05:59:57,643 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.027e+01 4.185e+01 4.353e+01 5.164e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 06:00:08,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1536720.0, ans=0.0 2023-12-24 06:00:12,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1536786.6666666667, ans=0.5 2023-12-24 06:00:13,579 INFO [train.py:886] (3/4) Epoch 49, batch 1750, loss[loss=0.01157, audio_tagging_loss=0.01157, over 25000.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4939835.11 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:00:26,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1536853.3333333333, ans=0.125 2023-12-24 06:01:05,969 INFO [train.py:886] (3/4) Epoch 49, batch 1800, loss[loss=0.01101, audio_tagging_loss=0.01101, over 25000.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4948850.26 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:01:26,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1537253.3333333333, ans=0.125 2023-12-24 06:01:41,129 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.734e+01 4.057e+01 4.201e+01 4.362e+01 5.500e+01, threshold=8.403e+01, percent-clipped=0.0 2023-12-24 06:01:52,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2023-12-24 06:01:57,714 INFO [train.py:886] (3/4) Epoch 49, batch 1850, loss[loss=0.01055, audio_tagging_loss=0.01055, over 25000.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4947800.58 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:02:24,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1537586.6666666667, ans=0.0 2023-12-24 06:02:39,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1537720.0, ans=0.125 2023-12-24 06:02:40,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1537720.0, ans=0.1 2023-12-24 06:02:49,912 INFO [train.py:886] (3/4) Epoch 49, batch 1900, loss[loss=0.01187, audio_tagging_loss=0.01187, over 24750.00 frames. ], tot_loss[loss=0.01072, audio_tagging_loss=0.01072, over 4947066.11 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:03:04,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1537853.3333333333, ans=0.125 2023-12-24 06:03:24,925 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.070e+01 4.198e+01 4.398e+01 6.870e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:03:41,668 INFO [train.py:886] (3/4) Epoch 49, batch 1950, loss[loss=0.01097, audio_tagging_loss=0.01097, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4944667.38 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:03:47,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-12-24 06:04:04,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1538253.3333333333, ans=0.1 2023-12-24 06:04:05,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1538253.3333333333, ans=0.0 2023-12-24 06:04:17,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-12-24 06:04:29,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.06 vs. limit=12.0 2023-12-24 06:04:33,144 INFO [train.py:886] (3/4) Epoch 49, batch 2000, loss[loss=0.01057, audio_tagging_loss=0.01057, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4941684.63 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:04:47,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1538520.0, ans=0.125 2023-12-24 06:04:47,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1538520.0, ans=0.125 2023-12-24 06:05:05,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1538653.3333333333, ans=0.125 2023-12-24 06:05:08,349 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.364e+01 3.984e+01 4.128e+01 4.387e+01 6.325e+01, threshold=8.257e+01, percent-clipped=0.0 2023-12-24 06:05:26,324 INFO [train.py:886] (3/4) Epoch 49, batch 2050, loss[loss=0.009348, audio_tagging_loss=0.009348, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4949421.29 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:05:45,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1538920.0, ans=0.125 2023-12-24 06:06:11,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-12-24 06:06:17,051 INFO [train.py:886] (3/4) Epoch 49, batch 2100, loss[loss=0.01126, audio_tagging_loss=0.01126, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4954936.86 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:06:38,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1539253.3333333333, ans=0.125 2023-12-24 06:06:52,077 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.619e+01 3.997e+01 4.198e+01 4.409e+01 5.519e+01, threshold=8.397e+01, percent-clipped=0.0 2023-12-24 06:07:09,403 INFO [train.py:886] (3/4) Epoch 49, batch 2150, loss[loss=0.01088, audio_tagging_loss=0.01088, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4961638.45 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:07:26,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1539520.0, ans=0.125 2023-12-24 06:07:27,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1539520.0, ans=10.0 2023-12-24 06:07:35,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1539586.6666666667, ans=0.125 2023-12-24 06:07:38,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1539586.6666666667, ans=0.125 2023-12-24 06:07:53,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1539720.0, ans=0.0 2023-12-24 06:08:01,527 INFO [train.py:886] (3/4) Epoch 49, batch 2200, loss[loss=0.01212, audio_tagging_loss=0.01212, over 24750.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4955307.60 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:08:09,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1539786.6666666667, ans=0.0 2023-12-24 06:08:17,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1539853.3333333333, ans=0.125 2023-12-24 06:08:19,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-12-24 06:08:35,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1539986.6666666667, ans=0.1 2023-12-24 06:08:35,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-12-24 06:08:35,892 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.719e+01 4.113e+01 4.276e+01 4.515e+01 5.433e+01, threshold=8.552e+01, percent-clipped=0.0 2023-12-24 06:08:51,805 INFO [train.py:886] (3/4) Epoch 49, batch 2250, loss[loss=0.009189, audio_tagging_loss=0.009189, over 23984.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4950169.04 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 64.0 2023-12-24 06:08:54,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-12-24 06:09:17,738 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:09:34,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1540386.6666666667, ans=0.125 2023-12-24 06:09:37,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1540386.6666666667, ans=0.125 2023-12-24 06:09:38,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1540386.6666666667, ans=0.0 2023-12-24 06:09:38,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1540386.6666666667, ans=0.125 2023-12-24 06:09:44,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1540453.3333333333, ans=0.0 2023-12-24 06:09:45,101 INFO [train.py:886] (3/4) Epoch 49, batch 2300, loss[loss=0.009999, audio_tagging_loss=0.009999, over 25000.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4951403.09 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:09:50,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1540453.3333333333, ans=0.0 2023-12-24 06:09:52,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-12-24 06:10:06,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1540586.6666666667, ans=0.2 2023-12-24 06:10:07,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1540586.6666666667, ans=0.0 2023-12-24 06:10:21,200 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.695e+01 3.977e+01 4.114e+01 4.288e+01 4.900e+01, threshold=8.227e+01, percent-clipped=0.0 2023-12-24 06:10:36,355 INFO [train.py:886] (3/4) Epoch 49, batch 2350, loss[loss=0.009704, audio_tagging_loss=0.009704, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4952167.51 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:10:42,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1540786.6666666667, ans=0.125 2023-12-24 06:10:49,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-12-24 06:11:02,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1540920.0, ans=0.1 2023-12-24 06:11:22,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1541053.3333333333, ans=0.0 2023-12-24 06:11:28,813 INFO [train.py:886] (3/4) Epoch 49, batch 2400, loss[loss=0.009974, audio_tagging_loss=0.009974, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4957496.48 frames. ], batch size: 99, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:11:31,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1541120.0, ans=0.0 2023-12-24 06:11:33,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1541120.0, ans=0.125 2023-12-24 06:11:37,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2023-12-24 06:11:40,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-12-24 06:11:46,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1541186.6666666667, ans=0.0 2023-12-24 06:11:51,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1541253.3333333333, ans=0.125 2023-12-24 06:11:51,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1541253.3333333333, ans=0.125 2023-12-24 06:11:59,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1541320.0, ans=0.125 2023-12-24 06:12:04,395 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.621e+01 3.987e+01 4.152e+01 4.367e+01 5.469e+01, threshold=8.304e+01, percent-clipped=0.0 2023-12-24 06:12:20,372 INFO [train.py:886] (3/4) Epoch 49, batch 2450, loss[loss=0.01264, audio_tagging_loss=0.01264, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4958182.10 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 32.0 2023-12-24 06:12:26,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1541453.3333333333, ans=0.05 2023-12-24 06:12:31,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-12-24 06:12:38,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-12-24 06:12:44,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1541586.6666666667, ans=0.025 2023-12-24 06:12:47,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1541586.6666666667, ans=0.125 2023-12-24 06:12:58,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1541653.3333333333, ans=0.0 2023-12-24 06:13:07,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1541720.0, ans=0.125 2023-12-24 06:13:11,139 INFO [train.py:886] (3/4) Epoch 49, batch 2500, loss[loss=0.01134, audio_tagging_loss=0.01134, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4956073.86 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:13:34,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1541920.0, ans=0.125 2023-12-24 06:13:46,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1541986.6666666667, ans=0.125 2023-12-24 06:13:46,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-12-24 06:13:47,838 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.799e+01 4.107e+01 4.264e+01 4.424e+01 5.486e+01, threshold=8.528e+01, percent-clipped=0.0 2023-12-24 06:13:49,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-12-24 06:14:04,216 INFO [train.py:886] (3/4) Epoch 49, batch 2550, loss[loss=0.01322, audio_tagging_loss=0.01322, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4942364.25 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:14:19,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1542186.6666666667, ans=0.1 2023-12-24 06:14:44,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-12-24 06:14:50,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1542386.6666666667, ans=0.125 2023-12-24 06:14:52,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1542386.6666666667, ans=0.05 2023-12-24 06:14:54,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-12-24 06:14:55,071 INFO [train.py:886] (3/4) Epoch 49, batch 2600, loss[loss=0.01046, audio_tagging_loss=0.01046, over 25000.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4946443.12 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:15:08,908 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:15:08,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1542520.0, ans=0.0 2023-12-24 06:15:25,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1542653.3333333333, ans=0.0 2023-12-24 06:15:32,853 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.739e+01 4.055e+01 4.229e+01 4.404e+01 4.899e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 06:15:41,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542720.0, ans=0.1 2023-12-24 06:15:45,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1542720.0, ans=0.125 2023-12-24 06:15:47,946 INFO [train.py:886] (3/4) Epoch 49, batch 2650, loss[loss=0.01138, audio_tagging_loss=0.01138, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4938813.04 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:15:49,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1542786.6666666667, ans=0.0 2023-12-24 06:16:16,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-12-24 06:16:18,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2023-12-24 06:16:21,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-12-24 06:16:23,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-12-24 06:16:37,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1543053.3333333333, ans=0.125 2023-12-24 06:16:37,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1543053.3333333333, ans=0.0 2023-12-24 06:16:37,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=1543053.3333333333, ans=6.0 2023-12-24 06:16:40,432 INFO [train.py:886] (3/4) Epoch 49, batch 2700, loss[loss=0.01258, audio_tagging_loss=0.01258, over 21703.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4944362.63 frames. ], batch size: 107, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:16:45,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1543120.0, ans=0.07 2023-12-24 06:16:51,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1543186.6666666667, ans=0.0 2023-12-24 06:16:57,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1543186.6666666667, ans=0.95 2023-12-24 06:17:03,170 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:17:03,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2023-12-24 06:17:07,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2023-12-24 06:17:12,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1543320.0, ans=0.0 2023-12-24 06:17:15,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1543320.0, ans=0.1 2023-12-24 06:17:16,718 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.523e+01 3.964e+01 4.171e+01 4.415e+01 4.994e+01, threshold=8.341e+01, percent-clipped=0.0 2023-12-24 06:17:21,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543386.6666666667, ans=0.125 2023-12-24 06:17:31,771 INFO [train.py:886] (3/4) Epoch 49, batch 2750, loss[loss=0.01092, audio_tagging_loss=0.01092, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4951358.99 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:17:40,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1543453.3333333333, ans=0.125 2023-12-24 06:17:41,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1543520.0, ans=0.0 2023-12-24 06:17:45,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1543520.0, ans=0.0 2023-12-24 06:18:03,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-12-24 06:18:24,104 INFO [train.py:886] (3/4) Epoch 49, batch 2800, loss[loss=0.009547, audio_tagging_loss=0.009547, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4951076.40 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:18:24,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543786.6666666667, ans=0.125 2023-12-24 06:18:56,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1543986.6666666667, ans=0.125 2023-12-24 06:18:59,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2023-12-24 06:19:00,877 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.540e+01 4.058e+01 4.176e+01 4.407e+01 5.903e+01, threshold=8.351e+01, percent-clipped=0.0 2023-12-24 06:19:01,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1543986.6666666667, ans=0.125 2023-12-24 06:19:16,495 INFO [train.py:886] (3/4) Epoch 49, batch 2850, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01068, audio_tagging_loss=0.01068, over 4947763.39 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:19:46,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1544320.0, ans=0.0 2023-12-24 06:19:55,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1544320.0, ans=0.125 2023-12-24 06:20:00,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1544386.6666666667, ans=0.2 2023-12-24 06:20:05,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1544386.6666666667, ans=0.0 2023-12-24 06:20:08,188 INFO [train.py:886] (3/4) Epoch 49, batch 2900, loss[loss=0.0113, audio_tagging_loss=0.0113, over 24750.00 frames. ], tot_loss[loss=0.01064, audio_tagging_loss=0.01064, over 4945441.31 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:20:08,414 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:20:14,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1544453.3333333333, ans=0.05 2023-12-24 06:20:21,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1544520.0, ans=0.125 2023-12-24 06:20:24,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2023-12-24 06:20:34,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1544586.6666666667, ans=0.125 2023-12-24 06:20:44,087 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.584e+01 4.050e+01 4.220e+01 4.394e+01 4.987e+01, threshold=8.439e+01, percent-clipped=0.0 2023-12-24 06:20:53,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1544720.0, ans=0.0 2023-12-24 06:20:54,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1544720.0, ans=0.0 2023-12-24 06:21:00,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-12-24 06:21:00,370 INFO [train.py:886] (3/4) Epoch 49, batch 2950, loss[loss=0.01131, audio_tagging_loss=0.01131, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4948078.16 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:09,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1544853.3333333333, ans=0.125 2023-12-24 06:21:38,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1544986.6666666667, ans=0.125 2023-12-24 06:21:38,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1544986.6666666667, ans=0.0 2023-12-24 06:21:45,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1545053.3333333333, ans=0.0 2023-12-24 06:21:48,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-12-24 06:21:51,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1545120.0, ans=0.125 2023-12-24 06:21:52,361 INFO [train.py:886] (3/4) Epoch 49, batch 3000, loss[loss=0.01288, audio_tagging_loss=0.01288, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4946141.47 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:21:52,362 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 06:22:13,825 INFO [train.py:917] (3/4) Epoch 49, validation: loss=0.03737, audio_tagging_loss=0.03737, over 3737520.00 frames. 2023-12-24 06:22:13,826 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 06:22:23,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1545186.6666666667, ans=0.1 2023-12-24 06:22:43,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1545320.0, ans=0.0 2023-12-24 06:22:50,396 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.477e+01 3.989e+01 4.185e+01 4.456e+01 5.215e+01, threshold=8.370e+01, percent-clipped=0.0 2023-12-24 06:22:57,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1545386.6666666667, ans=0.125 2023-12-24 06:22:58,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1545386.6666666667, ans=0.2 2023-12-24 06:23:06,471 INFO [train.py:886] (3/4) Epoch 49, batch 3050, loss[loss=0.01136, audio_tagging_loss=0.01136, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4952973.21 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:23:09,368 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:23:18,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1545520.0, ans=0.0 2023-12-24 06:23:24,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1545520.0, ans=0.0 2023-12-24 06:23:34,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-12-24 06:23:36,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1545586.6666666667, ans=0.125 2023-12-24 06:23:39,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1545653.3333333333, ans=0.125 2023-12-24 06:23:44,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1545653.3333333333, ans=0.125 2023-12-24 06:23:49,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1545720.0, ans=0.125 2023-12-24 06:23:49,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1545720.0, ans=0.125 2023-12-24 06:23:51,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1545720.0, ans=0.0 2023-12-24 06:23:57,216 INFO [train.py:886] (3/4) Epoch 49, batch 3100, loss[loss=0.01275, audio_tagging_loss=0.01275, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4959928.73 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:24:16,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1545920.0, ans=0.125 2023-12-24 06:24:31,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-12-24 06:24:33,098 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.660e+01 4.061e+01 4.253e+01 4.429e+01 4.827e+01, threshold=8.507e+01, percent-clipped=0.0 2023-12-24 06:24:33,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2023-12-24 06:24:36,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1545986.6666666667, ans=0.125 2023-12-24 06:24:40,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1546053.3333333333, ans=0.0 2023-12-24 06:24:45,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1546053.3333333333, ans=0.5 2023-12-24 06:24:47,930 INFO [train.py:886] (3/4) Epoch 49, batch 3150, loss[loss=0.009443, audio_tagging_loss=0.009443, over 24013.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4955394.31 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:25:08,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1546186.6666666667, ans=0.09899494936611666 2023-12-24 06:25:11,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1546253.3333333333, ans=0.1 2023-12-24 06:25:20,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1546320.0, ans=0.1 2023-12-24 06:25:30,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1546386.6666666667, ans=0.0 2023-12-24 06:25:38,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-12-24 06:25:40,642 INFO [train.py:886] (3/4) Epoch 49, batch 3200, loss[loss=0.01161, audio_tagging_loss=0.01161, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4948251.15 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:25:59,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2023-12-24 06:26:18,569 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.064e+01 4.235e+01 4.462e+01 5.298e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 06:26:29,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1546720.0, ans=0.125 2023-12-24 06:26:33,560 INFO [train.py:886] (3/4) Epoch 49, batch 3250, loss[loss=0.01112, audio_tagging_loss=0.01112, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4946349.89 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:26:35,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-12-24 06:27:00,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2023-12-24 06:27:01,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1546920.0, ans=0.0 2023-12-24 06:27:03,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2023-12-24 06:27:07,137 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:27:12,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1546986.6666666667, ans=0.125 2023-12-24 06:27:20,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1547053.3333333333, ans=0.2 2023-12-24 06:27:23,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2023-12-24 06:27:26,089 INFO [train.py:886] (3/4) Epoch 49, batch 3300, loss[loss=0.01048, audio_tagging_loss=0.01048, over 22223.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4950572.75 frames. ], batch size: 107, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:27:38,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1547186.6666666667, ans=0.07 2023-12-24 06:27:47,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1547253.3333333333, ans=0.125 2023-12-24 06:28:01,848 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.618e+01 4.007e+01 4.177e+01 4.374e+01 5.032e+01, threshold=8.354e+01, percent-clipped=0.0 2023-12-24 06:28:06,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1547386.6666666667, ans=0.125 2023-12-24 06:28:14,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1547386.6666666667, ans=0.125 2023-12-24 06:28:17,559 INFO [train.py:886] (3/4) Epoch 49, batch 3350, loss[loss=0.01107, audio_tagging_loss=0.01107, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4958194.40 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:28:33,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1547520.0, ans=0.125 2023-12-24 06:28:49,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1547653.3333333333, ans=0.0 2023-12-24 06:29:02,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2023-12-24 06:29:09,053 INFO [train.py:886] (3/4) Epoch 49, batch 3400, loss[loss=0.00942, audio_tagging_loss=0.00942, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4958528.34 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:29:20,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1547853.3333333333, ans=0.0 2023-12-24 06:29:21,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-12-24 06:29:45,477 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.515e+01 4.092e+01 4.242e+01 4.462e+01 5.102e+01, threshold=8.484e+01, percent-clipped=0.0 2023-12-24 06:29:53,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-24 06:30:02,454 INFO [train.py:886] (3/4) Epoch 49, batch 3450, loss[loss=0.008273, audio_tagging_loss=0.008273, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4948777.64 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:30:20,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1548186.6666666667, ans=0.5 2023-12-24 06:30:20,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1548186.6666666667, ans=0.125 2023-12-24 06:30:35,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1548320.0, ans=0.125 2023-12-24 06:30:38,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2023-12-24 06:30:47,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1548386.6666666667, ans=0.125 2023-12-24 06:30:47,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1548386.6666666667, ans=0.1 2023-12-24 06:30:52,202 INFO [train.py:886] (3/4) Epoch 49, batch 3500, loss[loss=0.01309, audio_tagging_loss=0.01309, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4940478.03 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:31:04,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1548520.0, ans=0.2 2023-12-24 06:31:04,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2023-12-24 06:31:06,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1548520.0, ans=0.2 2023-12-24 06:31:13,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2023-12-24 06:31:15,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1548586.6666666667, ans=0.0 2023-12-24 06:31:29,130 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.463e+01 4.041e+01 4.186e+01 4.358e+01 4.992e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 06:31:37,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1548720.0, ans=0.125 2023-12-24 06:31:44,751 INFO [train.py:886] (3/4) Epoch 49, batch 3550, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01056, audio_tagging_loss=0.01056, over 4937597.33 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:32:11,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-12-24 06:32:13,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1548920.0, ans=0.07 2023-12-24 06:32:34,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1549053.3333333333, ans=0.5 2023-12-24 06:32:36,658 INFO [train.py:886] (3/4) Epoch 49, batch 3600, loss[loss=0.005649, audio_tagging_loss=0.005649, over 24000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4939190.06 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:32:42,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1549120.0, ans=0.125 2023-12-24 06:32:56,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1549253.3333333333, ans=0.0 2023-12-24 06:33:05,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1549253.3333333333, ans=0.0 2023-12-24 06:33:13,501 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.476e+01 4.039e+01 4.199e+01 4.373e+01 5.772e+01, threshold=8.398e+01, percent-clipped=0.0 2023-12-24 06:33:15,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1549320.0, ans=0.5 2023-12-24 06:33:20,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1549386.6666666667, ans=0.125 2023-12-24 06:33:28,333 INFO [train.py:886] (3/4) Epoch 49, batch 3650, loss[loss=0.008851, audio_tagging_loss=0.008851, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4946760.81 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:33:33,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1549453.3333333333, ans=0.0 2023-12-24 06:33:35,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1549453.3333333333, ans=0.125 2023-12-24 06:33:35,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1549453.3333333333, ans=0.0 2023-12-24 06:33:44,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1549520.0, ans=0.0 2023-12-24 06:34:20,955 INFO [train.py:886] (3/4) Epoch 49, batch 3700, loss[loss=0.01144, audio_tagging_loss=0.01144, over 25000.00 frames. ], tot_loss[loss=0.01046, audio_tagging_loss=0.01046, over 4944410.96 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:34:28,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1549786.6666666667, ans=0.125 2023-12-24 06:34:42,698 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:34:46,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1549920.0, ans=0.125 2023-12-24 06:34:57,825 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.487e+01 4.052e+01 4.232e+01 4.524e+01 5.047e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 06:35:12,708 INFO [train.py:886] (3/4) Epoch 49, batch 3750, loss[loss=0.01031, audio_tagging_loss=0.01031, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4946705.77 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:35:22,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550186.6666666667, ans=0.1 2023-12-24 06:35:35,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-12-24 06:35:51,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1550320.0, ans=0.125 2023-12-24 06:35:55,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1550386.6666666667, ans=0.125 2023-12-24 06:36:04,934 INFO [train.py:886] (3/4) Epoch 49, batch 3800, loss[loss=0.01279, audio_tagging_loss=0.01279, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4946171.03 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:36:14,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1550520.0, ans=0.125 2023-12-24 06:36:30,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1550586.6666666667, ans=0.125 2023-12-24 06:36:40,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-12-24 06:36:41,089 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.564e+01 4.023e+01 4.204e+01 4.391e+01 4.921e+01, threshold=8.409e+01, percent-clipped=0.0 2023-12-24 06:36:45,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-12-24 06:36:51,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1550720.0, ans=0.1 2023-12-24 06:36:57,912 INFO [train.py:886] (3/4) Epoch 49, batch 3850, loss[loss=0.009183, audio_tagging_loss=0.009183, over 23984.00 frames. ], tot_loss[loss=0.0106, audio_tagging_loss=0.0106, over 4945122.04 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:36:59,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1550786.6666666667, ans=0.125 2023-12-24 06:37:48,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1551120.0, ans=0.025 2023-12-24 06:37:49,543 INFO [train.py:886] (3/4) Epoch 49, batch 3900, loss[loss=0.0107, audio_tagging_loss=0.0107, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4940586.72 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:38:09,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1551253.3333333333, ans=0.0 2023-12-24 06:38:22,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1551320.0, ans=0.0 2023-12-24 06:38:25,557 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.639e+01 4.074e+01 4.182e+01 4.374e+01 4.953e+01, threshold=8.363e+01, percent-clipped=0.0 2023-12-24 06:38:28,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1551320.0, ans=0.0 2023-12-24 06:38:41,223 INFO [train.py:886] (3/4) Epoch 49, batch 3950, loss[loss=0.008783, audio_tagging_loss=0.008783, over 25000.00 frames. ], tot_loss[loss=0.01043, audio_tagging_loss=0.01043, over 4946540.88 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:38:43,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1551453.3333333333, ans=0.125 2023-12-24 06:38:57,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=15.0 2023-12-24 06:39:03,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1551586.6666666667, ans=0.0 2023-12-24 06:39:06,897 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:39:33,398 INFO [train.py:886] (3/4) Epoch 49, batch 4000, loss[loss=0.009783, audio_tagging_loss=0.009783, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4950139.90 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:39:36,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1551786.6666666667, ans=0.2 2023-12-24 06:40:09,397 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.560e+01 4.058e+01 4.250e+01 4.476e+01 5.419e+01, threshold=8.500e+01, percent-clipped=0.0 2023-12-24 06:40:14,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1552053.3333333333, ans=0.1 2023-12-24 06:40:21,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-12-24 06:40:24,540 INFO [train.py:886] (3/4) Epoch 49, batch 4050, loss[loss=0.008967, audio_tagging_loss=0.008967, over 24750.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4948005.23 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:40:31,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1552120.0, ans=0.0 2023-12-24 06:40:32,718 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:40:41,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1552186.6666666667, ans=0.0 2023-12-24 06:40:42,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1552186.6666666667, ans=0.2 2023-12-24 06:40:51,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1552253.3333333333, ans=0.125 2023-12-24 06:40:54,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1552253.3333333333, ans=0.125 2023-12-24 06:41:15,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1552386.6666666667, ans=0.125 2023-12-24 06:41:17,512 INFO [train.py:886] (3/4) Epoch 49, batch 4100, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24943.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4947881.99 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:41:19,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1552453.3333333333, ans=0.0 2023-12-24 06:41:40,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1552586.6666666667, ans=0.125 2023-12-24 06:41:51,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1552653.3333333333, ans=0.0 2023-12-24 06:41:53,397 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.555e+01 4.048e+01 4.230e+01 4.426e+01 5.079e+01, threshold=8.460e+01, percent-clipped=0.0 2023-12-24 06:41:59,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1552720.0, ans=0.125 2023-12-24 06:42:08,572 INFO [train.py:886] (3/4) Epoch 49, batch 4150, loss[loss=0.009271, audio_tagging_loss=0.009271, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4941632.99 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:42:08,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1552786.6666666667, ans=0.1 2023-12-24 06:42:09,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-12-24 06:42:10,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1552786.6666666667, ans=0.2 2023-12-24 06:42:55,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553053.3333333333, ans=0.1 2023-12-24 06:42:59,751 INFO [train.py:886] (3/4) Epoch 49, batch 4200, loss[loss=0.01027, audio_tagging_loss=0.01027, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4942069.83 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:43:01,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1553120.0, ans=0.125 2023-12-24 06:43:14,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1553186.6666666667, ans=0.125 2023-12-24 06:43:36,489 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.716e+01 4.005e+01 4.208e+01 4.384e+01 4.988e+01, threshold=8.417e+01, percent-clipped=0.0 2023-12-24 06:43:36,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1553320.0, ans=0.1 2023-12-24 06:43:46,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553386.6666666667, ans=0.1 2023-12-24 06:43:47,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1553386.6666666667, ans=0.2 2023-12-24 06:43:49,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1553386.6666666667, ans=0.125 2023-12-24 06:43:52,894 INFO [train.py:886] (3/4) Epoch 49, batch 4250, loss[loss=0.0115, audio_tagging_loss=0.0115, over 24750.00 frames. ], tot_loss[loss=0.01041, audio_tagging_loss=0.01041, over 4942675.14 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:44:04,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1553520.0, ans=0.125 2023-12-24 06:44:06,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553520.0, ans=0.125 2023-12-24 06:44:10,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1553520.0, ans=0.0 2023-12-24 06:44:13,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.85 vs. limit=6.0 2023-12-24 06:44:28,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1553653.3333333333, ans=0.125 2023-12-24 06:44:32,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1553653.3333333333, ans=0.125 2023-12-24 06:44:42,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-12-24 06:44:43,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1553786.6666666667, ans=0.035 2023-12-24 06:44:44,045 INFO [train.py:886] (3/4) Epoch 49, batch 4300, loss[loss=0.009504, audio_tagging_loss=0.009504, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4950411.83 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 64.0 2023-12-24 06:45:00,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2023-12-24 06:45:02,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553853.3333333333, ans=0.125 2023-12-24 06:45:02,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1553853.3333333333, ans=0.2 2023-12-24 06:45:11,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1553920.0, ans=0.0 2023-12-24 06:45:21,782 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.664e+01 3.972e+01 4.193e+01 4.366e+01 5.433e+01, threshold=8.386e+01, percent-clipped=0.0 2023-12-24 06:45:24,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2023-12-24 06:45:37,138 INFO [train.py:886] (3/4) Epoch 49, batch 4350, loss[loss=0.0107, audio_tagging_loss=0.0107, over 25000.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4954950.10 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:45:44,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1554120.0, ans=0.1 2023-12-24 06:46:00,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=8.0 2023-12-24 06:46:13,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-12-24 06:46:24,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-12-24 06:46:28,747 INFO [train.py:886] (3/4) Epoch 49, batch 4400, loss[loss=0.007453, audio_tagging_loss=0.007453, over 24012.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4951743.12 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:47:03,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1554653.3333333333, ans=0.125 2023-12-24 06:47:07,212 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.187e+01 4.293e+01 4.475e+01 5.860e+01, threshold=8.587e+01, percent-clipped=0.0 2023-12-24 06:47:07,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2023-12-24 06:47:12,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1554720.0, ans=0.0 2023-12-24 06:47:18,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1554720.0, ans=0.125 2023-12-24 06:47:20,683 INFO [train.py:886] (3/4) Epoch 49, batch 4450, loss[loss=0.01135, audio_tagging_loss=0.01135, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4946488.02 frames. ], batch size: 99, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:47:23,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1554786.6666666667, ans=0.0 2023-12-24 06:47:27,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1554786.6666666667, ans=0.125 2023-12-24 06:47:34,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1554853.3333333333, ans=0.1 2023-12-24 06:47:39,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1554853.3333333333, ans=0.125 2023-12-24 06:47:54,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1554986.6666666667, ans=0.0 2023-12-24 06:48:12,847 INFO [train.py:886] (3/4) Epoch 49, batch 4500, loss[loss=0.01237, audio_tagging_loss=0.01237, over 25000.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4945526.00 frames. ], batch size: 100, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:48:16,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1555120.0, ans=0.1 2023-12-24 06:48:27,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1555186.6666666667, ans=0.1 2023-12-24 06:48:27,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1555186.6666666667, ans=0.125 2023-12-24 06:48:28,245 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:48:43,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1555320.0, ans=0.125 2023-12-24 06:48:43,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-12-24 06:48:49,613 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.358e+01 3.945e+01 4.149e+01 4.337e+01 5.254e+01, threshold=8.299e+01, percent-clipped=0.0 2023-12-24 06:48:50,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-12-24 06:48:55,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1555386.6666666667, ans=0.125 2023-12-24 06:49:00,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1555386.6666666667, ans=0.125 2023-12-24 06:49:03,807 INFO [train.py:886] (3/4) Epoch 49, batch 4550, loss[loss=0.008431, audio_tagging_loss=0.008431, over 22118.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4942263.79 frames. ], batch size: 107, lr: 2.18e-03, grad_scale: 32.0 2023-12-24 06:49:22,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1555520.0, ans=0.1 2023-12-24 06:49:44,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=1.99 vs. limit=12.0 2023-12-24 06:49:54,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1555720.0, ans=0.125 2023-12-24 06:49:56,021 INFO [train.py:886] (3/4) Epoch 49, batch 4600, loss[loss=0.01299, audio_tagging_loss=0.01299, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4946909.70 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:50:14,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-12-24 06:50:32,530 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.708e+01 4.082e+01 4.227e+01 4.419e+01 5.112e+01, threshold=8.454e+01, percent-clipped=0.0 2023-12-24 06:50:38,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1556053.3333333333, ans=0.125 2023-12-24 06:50:42,078 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 06:50:46,512 INFO [train.py:886] (3/4) Epoch 49, batch 4650, loss[loss=0.0109, audio_tagging_loss=0.0109, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4949584.42 frames. ], batch size: 100, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:51:07,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1556253.3333333333, ans=0.125 2023-12-24 06:51:08,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1556253.3333333333, ans=0.125 2023-12-24 06:51:09,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-24 06:51:19,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-12-24 06:51:36,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-12-24 06:51:37,689 INFO [train.py:886] (3/4) Epoch 49, batch 4700, loss[loss=0.0112, audio_tagging_loss=0.0112, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4946192.30 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:51:38,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1556453.3333333333, ans=0.0 2023-12-24 06:51:45,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1556520.0, ans=0.125 2023-12-24 06:51:47,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1556520.0, ans=0.2 2023-12-24 06:51:57,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1556586.6666666667, ans=0.125 2023-12-24 06:51:58,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1556586.6666666667, ans=0.2 2023-12-24 06:52:10,981 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.105e+01 4.278e+01 4.465e+01 5.122e+01, threshold=8.556e+01, percent-clipped=0.0 2023-12-24 06:52:24,081 INFO [train.py:886] (3/4) Epoch 49, batch 4750, loss[loss=0.01107, audio_tagging_loss=0.01107, over 24750.00 frames. ], tot_loss[loss=0.01071, audio_tagging_loss=0.01071, over 4941493.86 frames. ], batch size: 99, lr: 2.17e-03, grad_scale: 32.0 2023-12-24 06:52:29,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1556786.6666666667, ans=10.0 2023-12-24 06:52:31,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1556786.6666666667, ans=0.125 2023-12-24 06:52:59,428 INFO [train.py:886] (3/4) Epoch 50, batch 0, loss[loss=0.02693, audio_tagging_loss=0.02693, over 21809.00 frames. ], tot_loss[loss=0.02693, audio_tagging_loss=0.02693, over 21809.00 frames. ], batch size: 107, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 06:52:59,428 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 06:53:09,899 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6568, 3.1997, 4.1618, 3.8131], device='cuda:3') 2023-12-24 06:53:21,087 INFO [train.py:917] (3/4) Epoch 50, validation: loss=0.03747, audio_tagging_loss=0.03747, over 3737520.00 frames. 2023-12-24 06:53:21,088 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 06:53:30,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-12-24 06:53:44,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1557026.6666666667, ans=0.125 2023-12-24 06:53:46,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-12-24 06:53:59,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1557093.3333333333, ans=0.0 2023-12-24 06:54:11,354 INFO [train.py:886] (3/4) Epoch 50, batch 50, loss[loss=0.01485, audio_tagging_loss=0.01485, over 25000.00 frames. ], tot_loss[loss=0.01685, audio_tagging_loss=0.01685, over 1117981.36 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:54:35,196 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.740e+01 4.510e+01 5.078e+01 5.716e+01 1.112e+02, threshold=1.016e+02, percent-clipped=6.0 2023-12-24 06:54:43,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1557426.6666666667, ans=0.0 2023-12-24 06:55:03,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1557560.0, ans=0.125 2023-12-24 06:55:04,531 INFO [train.py:886] (3/4) Epoch 50, batch 100, loss[loss=0.01502, audio_tagging_loss=0.01502, over 24750.00 frames. ], tot_loss[loss=0.01486, audio_tagging_loss=0.01486, over 1975724.86 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:55:25,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1557693.3333333333, ans=0.125 2023-12-24 06:55:26,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1557693.3333333333, ans=0.0 2023-12-24 06:55:26,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1557693.3333333333, ans=15.0 2023-12-24 06:55:26,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2023-12-24 06:55:28,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1557693.3333333333, ans=0.0 2023-12-24 06:55:48,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1557826.6666666667, ans=0.125 2023-12-24 06:55:48,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-12-24 06:55:52,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1557826.6666666667, ans=0.125 2023-12-24 06:55:54,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2023-12-24 06:55:54,689 INFO [train.py:886] (3/4) Epoch 50, batch 150, loss[loss=0.01151, audio_tagging_loss=0.01151, over 25000.00 frames. ], tot_loss[loss=0.01361, audio_tagging_loss=0.01361, over 2642227.45 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:56:01,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1557893.3333333333, ans=0.125 2023-12-24 06:56:04,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1557893.3333333333, ans=0.125 2023-12-24 06:56:14,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1557960.0, ans=0.125 2023-12-24 06:56:18,341 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.960e+01 4.220e+01 4.441e+01 4.666e+01 5.364e+01, threshold=8.881e+01, percent-clipped=0.0 2023-12-24 06:56:25,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1558093.3333333333, ans=0.125 2023-12-24 06:56:47,107 INFO [train.py:886] (3/4) Epoch 50, batch 200, loss[loss=0.01104, audio_tagging_loss=0.01104, over 25000.00 frames. ], tot_loss[loss=0.01291, audio_tagging_loss=0.01291, over 3156323.50 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:56:55,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-12-24 06:56:56,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1558293.3333333333, ans=0.09899494936611666 2023-12-24 06:57:08,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1558360.0, ans=0.0 2023-12-24 06:57:31,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1558493.3333333333, ans=0.125 2023-12-24 06:57:33,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1558493.3333333333, ans=0.125 2023-12-24 06:57:37,402 INFO [train.py:886] (3/4) Epoch 50, batch 250, loss[loss=0.01325, audio_tagging_loss=0.01325, over 24750.00 frames. ], tot_loss[loss=0.01226, audio_tagging_loss=0.01226, over 3553254.34 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:57:44,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2023-12-24 06:57:48,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1558626.6666666667, ans=0.125 2023-12-24 06:57:50,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1558626.6666666667, ans=0.125 2023-12-24 06:57:53,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1558626.6666666667, ans=0.2 2023-12-24 06:58:00,320 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.630e+01 4.054e+01 4.252e+01 4.416e+01 4.947e+01, threshold=8.505e+01, percent-clipped=0.0 2023-12-24 06:58:04,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2023-12-24 06:58:06,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1558693.3333333333, ans=0.0 2023-12-24 06:58:07,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-12-24 06:58:14,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-12-24 06:58:17,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1558760.0, ans=0.0 2023-12-24 06:58:20,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1558826.6666666667, ans=0.125 2023-12-24 06:58:28,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1558893.3333333333, ans=0.125 2023-12-24 06:58:29,393 INFO [train.py:886] (3/4) Epoch 50, batch 300, loss[loss=0.009349, audio_tagging_loss=0.009349, over 24750.00 frames. ], tot_loss[loss=0.01181, audio_tagging_loss=0.01181, over 3860365.68 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:58:37,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1558893.3333333333, ans=0.2 2023-12-24 06:58:45,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1558960.0, ans=0.0 2023-12-24 06:58:47,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.17 vs. limit=22.5 2023-12-24 06:58:50,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1559026.6666666667, ans=0.125 2023-12-24 06:59:14,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-12-24 06:59:17,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1559160.0, ans=0.125 2023-12-24 06:59:20,875 INFO [train.py:886] (3/4) Epoch 50, batch 350, loss[loss=0.01191, audio_tagging_loss=0.01191, over 25000.00 frames. ], tot_loss[loss=0.01162, audio_tagging_loss=0.01162, over 4098715.75 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 16.0 2023-12-24 06:59:23,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1559226.6666666667, ans=0.1 2023-12-24 06:59:28,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1559226.6666666667, ans=0.125 2023-12-24 06:59:43,066 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.541e+01 4.035e+01 4.228e+01 4.389e+01 4.773e+01, threshold=8.456e+01, percent-clipped=0.0 2023-12-24 06:59:48,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1559360.0, ans=0.125 2023-12-24 06:59:55,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1559426.6666666667, ans=0.0 2023-12-24 07:00:04,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1559493.3333333333, ans=0.0 2023-12-24 07:00:08,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1559493.3333333333, ans=0.125 2023-12-24 07:00:12,429 INFO [train.py:886] (3/4) Epoch 50, batch 400, loss[loss=0.01132, audio_tagging_loss=0.01132, over 24750.00 frames. ], tot_loss[loss=0.01125, audio_tagging_loss=0.01125, over 4282072.40 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:00:19,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1559560.0, ans=0.125 2023-12-24 07:00:25,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1559626.6666666667, ans=0.0 2023-12-24 07:00:47,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2023-12-24 07:00:48,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1559760.0, ans=0.125 2023-12-24 07:00:55,945 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:01:00,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-12-24 07:01:04,252 INFO [train.py:886] (3/4) Epoch 50, batch 450, loss[loss=0.01005, audio_tagging_loss=0.01005, over 24750.00 frames. ], tot_loss[loss=0.01107, audio_tagging_loss=0.01107, over 4430558.59 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:01:12,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2023-12-24 07:01:14,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1559960.0, ans=0.0 2023-12-24 07:01:26,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1560026.6666666667, ans=0.0 2023-12-24 07:01:26,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2023-12-24 07:01:28,029 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.735e+01 4.049e+01 4.184e+01 4.376e+01 4.940e+01, threshold=8.368e+01, percent-clipped=0.0 2023-12-24 07:01:57,590 INFO [train.py:886] (3/4) Epoch 50, batch 500, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01092, audio_tagging_loss=0.01092, over 4545141.49 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:01:58,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1560226.6666666667, ans=0.125 2023-12-24 07:02:10,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1560293.3333333333, ans=0.125 2023-12-24 07:02:13,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1560293.3333333333, ans=0.1 2023-12-24 07:02:49,028 INFO [train.py:886] (3/4) Epoch 50, batch 550, loss[loss=0.01132, audio_tagging_loss=0.01132, over 25000.00 frames. ], tot_loss[loss=0.01087, audio_tagging_loss=0.01087, over 4636046.80 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:03:00,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1560626.6666666667, ans=0.125 2023-12-24 07:03:11,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1560693.3333333333, ans=0.125 2023-12-24 07:03:12,065 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.758e+01 4.103e+01 4.273e+01 4.476e+01 5.412e+01, threshold=8.546e+01, percent-clipped=0.0 2023-12-24 07:03:27,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-12-24 07:03:35,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1560826.6666666667, ans=0.125 2023-12-24 07:03:40,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=12.0 2023-12-24 07:03:41,669 INFO [train.py:886] (3/4) Epoch 50, batch 600, loss[loss=0.01028, audio_tagging_loss=0.01028, over 24750.00 frames. ], tot_loss[loss=0.01083, audio_tagging_loss=0.01083, over 4708563.99 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:03:47,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2023-12-24 07:03:59,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2023-12-24 07:04:02,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1561026.6666666667, ans=0.125 2023-12-24 07:04:11,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=12.0 2023-12-24 07:04:14,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1561093.3333333333, ans=0.0 2023-12-24 07:04:15,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1561093.3333333333, ans=0.125 2023-12-24 07:04:24,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1561160.0, ans=0.1 2023-12-24 07:04:34,500 INFO [train.py:886] (3/4) Epoch 50, batch 650, loss[loss=0.01094, audio_tagging_loss=0.01094, over 22773.00 frames. ], tot_loss[loss=0.01094, audio_tagging_loss=0.01094, over 4753996.26 frames. ], batch size: 107, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:04:36,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.62 vs. limit=15.0 2023-12-24 07:04:56,133 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.774e+01 4.125e+01 4.275e+01 4.523e+01 5.661e+01, threshold=8.549e+01, percent-clipped=0.0 2023-12-24 07:05:09,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2023-12-24 07:05:18,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1561493.3333333333, ans=0.125 2023-12-24 07:05:25,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1561560.0, ans=0.0 2023-12-24 07:05:25,716 INFO [train.py:886] (3/4) Epoch 50, batch 700, loss[loss=0.01149, audio_tagging_loss=0.01149, over 24750.00 frames. ], tot_loss[loss=0.01086, audio_tagging_loss=0.01086, over 4797599.29 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:05:40,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-12-24 07:05:41,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1561626.6666666667, ans=0.0 2023-12-24 07:05:49,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-12-24 07:05:52,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1561693.3333333333, ans=0.1 2023-12-24 07:05:57,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-12-24 07:06:14,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1561826.6666666667, ans=0.0 2023-12-24 07:06:17,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-12-24 07:06:18,169 INFO [train.py:886] (3/4) Epoch 50, batch 750, loss[loss=0.008949, audio_tagging_loss=0.008949, over 25000.00 frames. ], tot_loss[loss=0.01073, audio_tagging_loss=0.01073, over 4833717.19 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:06:26,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1561960.0, ans=0.0 2023-12-24 07:06:32,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1561960.0, ans=22.5 2023-12-24 07:06:37,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1562026.6666666667, ans=0.125 2023-12-24 07:06:40,460 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.722e+01 4.000e+01 4.156e+01 4.359e+01 5.451e+01, threshold=8.313e+01, percent-clipped=0.0 2023-12-24 07:06:49,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-12-24 07:07:01,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-12-24 07:07:05,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=12.0 2023-12-24 07:07:09,173 INFO [train.py:886] (3/4) Epoch 50, batch 800, loss[loss=0.009219, audio_tagging_loss=0.009219, over 21704.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4861100.82 frames. ], batch size: 107, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:07:17,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=12.0 2023-12-24 07:07:21,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1562293.3333333333, ans=0.07 2023-12-24 07:07:23,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1562293.3333333333, ans=0.0 2023-12-24 07:07:23,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-12-24 07:07:30,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1562360.0, ans=0.125 2023-12-24 07:08:00,081 INFO [train.py:886] (3/4) Epoch 50, batch 850, loss[loss=0.01256, audio_tagging_loss=0.01256, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4885926.09 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:08:05,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1562560.0, ans=0.125 2023-12-24 07:08:23,899 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.686e+01 3.999e+01 4.249e+01 4.473e+01 4.944e+01, threshold=8.498e+01, percent-clipped=0.0 2023-12-24 07:08:25,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.05 vs. limit=12.0 2023-12-24 07:08:27,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1562693.3333333333, ans=0.125 2023-12-24 07:08:28,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1562693.3333333333, ans=0.0 2023-12-24 07:08:42,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=15.0 2023-12-24 07:08:45,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2023-12-24 07:08:51,882 INFO [train.py:886] (3/4) Epoch 50, batch 900, loss[loss=0.01015, audio_tagging_loss=0.01015, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4901709.20 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:43,834 INFO [train.py:886] (3/4) Epoch 50, batch 950, loss[loss=0.009916, audio_tagging_loss=0.009916, over 24750.00 frames. ], tot_loss[loss=0.01065, audio_tagging_loss=0.01065, over 4907712.79 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:09:50,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=12.0 2023-12-24 07:09:57,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1563293.3333333333, ans=0.2 2023-12-24 07:10:03,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1563293.3333333333, ans=0.125 2023-12-24 07:10:03,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-12-24 07:10:07,342 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.705e+01 4.079e+01 4.256e+01 4.403e+01 5.816e+01, threshold=8.513e+01, percent-clipped=0.0 2023-12-24 07:10:09,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1563360.0, ans=0.125 2023-12-24 07:10:11,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2023-12-24 07:10:15,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1563426.6666666667, ans=0.1 2023-12-24 07:10:33,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1563493.3333333333, ans=0.0 2023-12-24 07:10:36,910 INFO [train.py:886] (3/4) Epoch 50, batch 1000, loss[loss=0.01206, audio_tagging_loss=0.01206, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4912170.49 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:10:37,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1563560.0, ans=0.0 2023-12-24 07:10:45,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1563626.6666666667, ans=0.0 2023-12-24 07:10:54,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1563626.6666666667, ans=0.2 2023-12-24 07:11:11,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-12-24 07:11:28,040 INFO [train.py:886] (3/4) Epoch 50, batch 1050, loss[loss=0.01016, audio_tagging_loss=0.01016, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4914169.46 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:11:51,239 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.047e+01 4.197e+01 4.433e+01 5.378e+01, threshold=8.395e+01, percent-clipped=0.0 2023-12-24 07:11:53,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1564026.6666666667, ans=0.125 2023-12-24 07:11:53,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1564026.6666666667, ans=0.125 2023-12-24 07:12:17,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2023-12-24 07:12:18,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1564160.0, ans=0.125 2023-12-24 07:12:20,613 INFO [train.py:886] (3/4) Epoch 50, batch 1100, loss[loss=0.009, audio_tagging_loss=0.009, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4923357.77 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:12:44,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-12-24 07:12:48,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2023-12-24 07:12:51,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1564426.6666666667, ans=0.07 2023-12-24 07:12:52,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1564426.6666666667, ans=0.125 2023-12-24 07:13:12,696 INFO [train.py:886] (3/4) Epoch 50, batch 1150, loss[loss=0.01033, audio_tagging_loss=0.01033, over 24750.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4930263.11 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:13:14,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1564560.0, ans=0.0 2023-12-24 07:13:16,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1564560.0, ans=0.125 2023-12-24 07:13:17,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1564560.0, ans=0.125 2023-12-24 07:13:17,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1564560.0, ans=0.125 2023-12-24 07:13:24,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1564626.6666666667, ans=0.125 2023-12-24 07:13:31,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2023-12-24 07:13:34,236 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.707e+01 4.052e+01 4.234e+01 4.430e+01 4.895e+01, threshold=8.468e+01, percent-clipped=0.0 2023-12-24 07:13:46,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1564760.0, ans=0.0 2023-12-24 07:14:00,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1564826.6666666667, ans=0.0 2023-12-24 07:14:00,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-12-24 07:14:03,662 INFO [train.py:886] (3/4) Epoch 50, batch 1200, loss[loss=0.01071, audio_tagging_loss=0.01071, over 25000.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4936272.33 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:14:10,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1564893.3333333333, ans=0.125 2023-12-24 07:14:41,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-12-24 07:14:55,931 INFO [train.py:886] (3/4) Epoch 50, batch 1250, loss[loss=0.009565, audio_tagging_loss=0.009565, over 24750.00 frames. ], tot_loss[loss=0.01055, audio_tagging_loss=0.01055, over 4930063.30 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:14:57,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1565226.6666666667, ans=0.09899494936611666 2023-12-24 07:15:09,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1565293.3333333333, ans=0.0 2023-12-24 07:15:10,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1565293.3333333333, ans=0.1 2023-12-24 07:15:11,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1565293.3333333333, ans=0.0 2023-12-24 07:15:13,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1565293.3333333333, ans=0.125 2023-12-24 07:15:19,847 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.704e+01 4.120e+01 4.290e+01 4.537e+01 5.051e+01, threshold=8.580e+01, percent-clipped=0.0 2023-12-24 07:15:25,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.26 vs. limit=10.0 2023-12-24 07:15:36,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-12-24 07:15:43,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1565493.3333333333, ans=0.2 2023-12-24 07:15:43,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1565493.3333333333, ans=0.1 2023-12-24 07:15:44,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1565493.3333333333, ans=0.125 2023-12-24 07:15:47,908 INFO [train.py:886] (3/4) Epoch 50, batch 1300, loss[loss=0.01029, audio_tagging_loss=0.01029, over 24750.00 frames. ], tot_loss[loss=0.01066, audio_tagging_loss=0.01066, over 4933017.01 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:27,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1565760.0, ans=0.125 2023-12-24 07:16:30,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1565826.6666666667, ans=0.0 2023-12-24 07:16:39,857 INFO [train.py:886] (3/4) Epoch 50, batch 1350, loss[loss=0.01093, audio_tagging_loss=0.01093, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4935229.23 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:16:44,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1565893.3333333333, ans=0.0 2023-12-24 07:17:00,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1566026.6666666667, ans=0.1 2023-12-24 07:17:01,930 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.124e+01 4.286e+01 4.481e+01 5.287e+01, threshold=8.572e+01, percent-clipped=0.0 2023-12-24 07:17:30,531 INFO [train.py:886] (3/4) Epoch 50, batch 1400, loss[loss=0.009929, audio_tagging_loss=0.009929, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4943701.37 frames. ], batch size: 99, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:17:33,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1566226.6666666667, ans=0.2 2023-12-24 07:17:44,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1566293.3333333333, ans=0.5 2023-12-24 07:18:02,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1566426.6666666667, ans=0.125 2023-12-24 07:18:08,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1566426.6666666667, ans=0.125 2023-12-24 07:18:12,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1566493.3333333333, ans=0.2 2023-12-24 07:18:17,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2023-12-24 07:18:18,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1566493.3333333333, ans=0.2 2023-12-24 07:18:21,438 INFO [train.py:886] (3/4) Epoch 50, batch 1450, loss[loss=0.009347, audio_tagging_loss=0.009347, over 25000.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4952770.93 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:18:28,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1566560.0, ans=0.0 2023-12-24 07:18:30,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1566560.0, ans=0.0 2023-12-24 07:18:32,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1566626.6666666667, ans=0.125 2023-12-24 07:18:37,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-12-24 07:18:43,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-12-24 07:18:44,692 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.709e+01 4.091e+01 4.245e+01 4.456e+01 5.361e+01, threshold=8.489e+01, percent-clipped=0.0 2023-12-24 07:18:49,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1566693.3333333333, ans=10.0 2023-12-24 07:19:14,049 INFO [train.py:886] (3/4) Epoch 50, batch 1500, loss[loss=0.01149, audio_tagging_loss=0.01149, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4957607.34 frames. ], batch size: 100, lr: 2.15e-03, grad_scale: 32.0 2023-12-24 07:19:17,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1566893.3333333333, ans=0.2 2023-12-24 07:19:20,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-12-24 07:19:30,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-12-24 07:19:42,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-12-24 07:19:48,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2023-12-24 07:19:50,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1567093.3333333333, ans=0.125 2023-12-24 07:19:50,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567093.3333333333, ans=0.1 2023-12-24 07:19:52,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1567093.3333333333, ans=0.0 2023-12-24 07:19:55,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1567160.0, ans=0.2 2023-12-24 07:20:06,499 INFO [train.py:886] (3/4) Epoch 50, batch 1550, loss[loss=0.008961, audio_tagging_loss=0.008961, over 24750.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4954058.30 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:20:15,908 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:20:16,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1567293.3333333333, ans=0.125 2023-12-24 07:20:24,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1567293.3333333333, ans=0.5 2023-12-24 07:20:28,754 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.759e+01 4.163e+01 4.336e+01 4.479e+01 4.983e+01, threshold=8.671e+01, percent-clipped=0.0 2023-12-24 07:20:30,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1567360.0, ans=0.0 2023-12-24 07:20:35,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1567360.0, ans=0.125 2023-12-24 07:20:47,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1567493.3333333333, ans=0.125 2023-12-24 07:20:57,207 INFO [train.py:886] (3/4) Epoch 50, batch 1600, loss[loss=0.009627, audio_tagging_loss=0.009627, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4947454.04 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:21:36,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1567760.0, ans=0.0 2023-12-24 07:21:43,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1567826.6666666667, ans=0.125 2023-12-24 07:21:46,307 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:21:49,923 INFO [train.py:886] (3/4) Epoch 50, batch 1650, loss[loss=0.01036, audio_tagging_loss=0.01036, over 24750.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4944181.68 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:21:56,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2023-12-24 07:22:01,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1567960.0, ans=0.0 2023-12-24 07:22:10,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1568026.6666666667, ans=0.125 2023-12-24 07:22:14,135 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.627e+01 4.076e+01 4.290e+01 4.470e+01 5.188e+01, threshold=8.579e+01, percent-clipped=0.0 2023-12-24 07:22:16,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1568026.6666666667, ans=0.1 2023-12-24 07:22:29,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-12-24 07:22:42,264 INFO [train.py:886] (3/4) Epoch 50, batch 1700, loss[loss=0.01249, audio_tagging_loss=0.01249, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4945404.44 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:22:45,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1568226.6666666667, ans=0.125 2023-12-24 07:23:01,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1568293.3333333333, ans=0.1 2023-12-24 07:23:02,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1568360.0, ans=0.2 2023-12-24 07:23:03,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1568360.0, ans=0.1 2023-12-24 07:23:05,838 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:23:34,048 INFO [train.py:886] (3/4) Epoch 50, batch 1750, loss[loss=0.00908, audio_tagging_loss=0.00908, over 25000.00 frames. ], tot_loss[loss=0.01041, audio_tagging_loss=0.01041, over 4951720.92 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:23:37,189 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:23:38,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1568560.0, ans=0.2 2023-12-24 07:23:48,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1568626.6666666667, ans=0.1 2023-12-24 07:23:57,035 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.573e+01 3.982e+01 4.200e+01 4.347e+01 4.919e+01, threshold=8.401e+01, percent-clipped=0.0 2023-12-24 07:24:08,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=22.5 2023-12-24 07:24:16,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1568826.6666666667, ans=0.07 2023-12-24 07:24:19,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1568826.6666666667, ans=0.125 2023-12-24 07:24:26,389 INFO [train.py:886] (3/4) Epoch 50, batch 1800, loss[loss=0.009777, audio_tagging_loss=0.009777, over 25000.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4954270.42 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:24:37,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1568960.0, ans=0.125 2023-12-24 07:24:49,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1569026.6666666667, ans=10.0 2023-12-24 07:24:50,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2023-12-24 07:24:53,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.53 vs. limit=10.0 2023-12-24 07:25:16,808 INFO [train.py:886] (3/4) Epoch 50, batch 1850, loss[loss=0.01051, audio_tagging_loss=0.01051, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4949776.38 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:25:19,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2023-12-24 07:25:27,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1569226.6666666667, ans=0.1 2023-12-24 07:25:38,334 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:25:39,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1569360.0, ans=0.125 2023-12-24 07:25:40,870 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.702e+01 4.153e+01 4.266e+01 4.443e+01 5.377e+01, threshold=8.532e+01, percent-clipped=0.0 2023-12-24 07:26:03,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1569493.3333333333, ans=0.0 2023-12-24 07:26:04,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.80 vs. limit=10.0 2023-12-24 07:26:05,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1569493.3333333333, ans=0.125 2023-12-24 07:26:09,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2023-12-24 07:26:10,193 INFO [train.py:886] (3/4) Epoch 50, batch 1900, loss[loss=0.01042, audio_tagging_loss=0.01042, over 24750.00 frames. ], tot_loss[loss=0.0107, audio_tagging_loss=0.0107, over 4947838.44 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:26:22,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-12-24 07:27:01,925 INFO [train.py:886] (3/4) Epoch 50, batch 1950, loss[loss=0.009595, audio_tagging_loss=0.009595, over 24750.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4948730.55 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:27:11,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1569960.0, ans=0.0 2023-12-24 07:27:22,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2023-12-24 07:27:22,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1570026.6666666667, ans=0.125 2023-12-24 07:27:23,191 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.684e+01 4.036e+01 4.252e+01 4.490e+01 5.188e+01, threshold=8.504e+01, percent-clipped=0.0 2023-12-24 07:27:43,411 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:27:47,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1570160.0, ans=0.0 2023-12-24 07:27:51,720 INFO [train.py:886] (3/4) Epoch 50, batch 2000, loss[loss=0.01094, audio_tagging_loss=0.01094, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4949965.87 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:28:20,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1570360.0, ans=0.2 2023-12-24 07:28:23,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1570426.6666666667, ans=0.125 2023-12-24 07:28:30,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=12.0 2023-12-24 07:28:36,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1570493.3333333333, ans=10.0 2023-12-24 07:28:44,789 INFO [train.py:886] (3/4) Epoch 50, batch 2050, loss[loss=0.01003, audio_tagging_loss=0.01003, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4950170.16 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:29:07,141 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.692e+01 4.007e+01 4.186e+01 4.409e+01 4.904e+01, threshold=8.372e+01, percent-clipped=0.0 2023-12-24 07:29:10,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1570693.3333333333, ans=0.0 2023-12-24 07:29:16,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1570760.0, ans=0.05 2023-12-24 07:29:25,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1570826.6666666667, ans=0.0 2023-12-24 07:29:30,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1570826.6666666667, ans=0.0 2023-12-24 07:29:35,801 INFO [train.py:886] (3/4) Epoch 50, batch 2100, loss[loss=0.008897, audio_tagging_loss=0.008897, over 20673.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4945505.62 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:29:45,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1570960.0, ans=0.0 2023-12-24 07:29:53,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1570960.0, ans=0.0 2023-12-24 07:30:04,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2023-12-24 07:30:05,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1571093.3333333333, ans=0.125 2023-12-24 07:30:17,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1571160.0, ans=0.125 2023-12-24 07:30:28,617 INFO [train.py:886] (3/4) Epoch 50, batch 2150, loss[loss=0.01181, audio_tagging_loss=0.01181, over 25000.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4944591.98 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:30:40,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1571293.3333333333, ans=0.0 2023-12-24 07:30:49,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1571360.0, ans=0.1 2023-12-24 07:30:51,641 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.676e+01 4.091e+01 4.279e+01 4.499e+01 5.273e+01, threshold=8.558e+01, percent-clipped=0.0 2023-12-24 07:31:04,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571426.6666666667, ans=0.1 2023-12-24 07:31:12,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1571493.3333333333, ans=0.0 2023-12-24 07:31:21,072 INFO [train.py:886] (3/4) Epoch 50, batch 2200, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4940318.07 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:31:21,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.82 vs. limit=15.0 2023-12-24 07:31:24,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1571560.0, ans=0.125 2023-12-24 07:32:05,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571826.6666666667, ans=0.1 2023-12-24 07:32:12,248 INFO [train.py:886] (3/4) Epoch 50, batch 2250, loss[loss=0.01143, audio_tagging_loss=0.01143, over 25000.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4938086.35 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:32:14,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1571893.3333333333, ans=0.125 2023-12-24 07:32:17,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1571893.3333333333, ans=0.035 2023-12-24 07:32:35,510 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.741e+01 4.093e+01 4.254e+01 4.470e+01 6.173e+01, threshold=8.508e+01, percent-clipped=0.0 2023-12-24 07:32:59,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1572160.0, ans=0.125 2023-12-24 07:33:04,679 INFO [train.py:886] (3/4) Epoch 50, batch 2300, loss[loss=0.009222, audio_tagging_loss=0.009222, over 25000.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4940865.24 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:33:17,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1572293.3333333333, ans=0.0 2023-12-24 07:33:27,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1572360.0, ans=0.125 2023-12-24 07:33:31,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=12.0 2023-12-24 07:33:46,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1572493.3333333333, ans=0.125 2023-12-24 07:33:56,443 INFO [train.py:886] (3/4) Epoch 50, batch 2350, loss[loss=0.00915, audio_tagging_loss=0.00915, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4944440.80 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:33:56,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1572560.0, ans=0.125 2023-12-24 07:34:00,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1572560.0, ans=0.0 2023-12-24 07:34:00,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1572560.0, ans=0.2 2023-12-24 07:34:18,577 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.700e+01 4.055e+01 4.217e+01 4.418e+01 5.746e+01, threshold=8.434e+01, percent-clipped=0.0 2023-12-24 07:34:41,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.62 vs. limit=10.0 2023-12-24 07:34:48,217 INFO [train.py:886] (3/4) Epoch 50, batch 2400, loss[loss=0.01352, audio_tagging_loss=0.01352, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4950075.54 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:34:54,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1572893.3333333333, ans=0.0 2023-12-24 07:35:01,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1572960.0, ans=0.125 2023-12-24 07:35:06,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1572960.0, ans=0.125 2023-12-24 07:35:10,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1573026.6666666667, ans=0.2 2023-12-24 07:35:26,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1573093.3333333333, ans=0.1 2023-12-24 07:35:28,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1573160.0, ans=0.2 2023-12-24 07:35:36,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1573160.0, ans=0.125 2023-12-24 07:35:40,404 INFO [train.py:886] (3/4) Epoch 50, batch 2450, loss[loss=0.01209, audio_tagging_loss=0.01209, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4947525.10 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 64.0 2023-12-24 07:36:02,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1573360.0, ans=0.0 2023-12-24 07:36:02,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1573360.0, ans=0.2 2023-12-24 07:36:04,785 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.654e+01 4.083e+01 4.271e+01 4.433e+01 5.085e+01, threshold=8.543e+01, percent-clipped=0.0 2023-12-24 07:36:19,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1573426.6666666667, ans=0.125 2023-12-24 07:36:23,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1573493.3333333333, ans=0.0 2023-12-24 07:36:27,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-12-24 07:36:33,465 INFO [train.py:886] (3/4) Epoch 50, batch 2500, loss[loss=0.01388, audio_tagging_loss=0.01388, over 24750.00 frames. ], tot_loss[loss=0.01058, audio_tagging_loss=0.01058, over 4949681.54 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:36:52,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1573626.6666666667, ans=0.125 2023-12-24 07:36:58,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2023-12-24 07:37:00,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1573693.3333333333, ans=10.0 2023-12-24 07:37:01,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1573693.3333333333, ans=0.0 2023-12-24 07:37:01,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1573693.3333333333, ans=0.2 2023-12-24 07:37:14,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1573760.0, ans=0.0 2023-12-24 07:37:22,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2023-12-24 07:37:25,234 INFO [train.py:886] (3/4) Epoch 50, batch 2550, loss[loss=0.00803, audio_tagging_loss=0.00803, over 22717.00 frames. ], tot_loss[loss=0.01063, audio_tagging_loss=0.01063, over 4947397.97 frames. ], batch size: 107, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:37:25,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1573893.3333333333, ans=0.125 2023-12-24 07:37:48,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1574026.6666666667, ans=0.125 2023-12-24 07:37:49,907 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.655e+01 4.126e+01 4.329e+01 4.523e+01 5.381e+01, threshold=8.659e+01, percent-clipped=0.0 2023-12-24 07:38:00,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1574093.3333333333, ans=0.125 2023-12-24 07:38:10,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1574160.0, ans=0.125 2023-12-24 07:38:18,379 INFO [train.py:886] (3/4) Epoch 50, batch 2600, loss[loss=0.01022, audio_tagging_loss=0.01022, over 25000.00 frames. ], tot_loss[loss=0.01061, audio_tagging_loss=0.01061, over 4944079.98 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:38:18,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1574226.6666666667, ans=0.0 2023-12-24 07:38:21,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1574226.6666666667, ans=0.1 2023-12-24 07:38:44,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1574360.0, ans=0.1 2023-12-24 07:38:44,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1574360.0, ans=0.125 2023-12-24 07:39:05,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1574493.3333333333, ans=0.0 2023-12-24 07:39:09,516 INFO [train.py:886] (3/4) Epoch 50, batch 2650, loss[loss=0.01167, audio_tagging_loss=0.01167, over 25000.00 frames. ], tot_loss[loss=0.01059, audio_tagging_loss=0.01059, over 4944230.26 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:39:29,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1574626.6666666667, ans=0.0 2023-12-24 07:39:33,606 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.680e+01 4.070e+01 4.297e+01 4.488e+01 5.436e+01, threshold=8.593e+01, percent-clipped=0.0 2023-12-24 07:39:36,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2023-12-24 07:39:38,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1574693.3333333333, ans=0.2 2023-12-24 07:39:53,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1574826.6666666667, ans=0.2 2023-12-24 07:39:58,221 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:40:01,871 INFO [train.py:886] (3/4) Epoch 50, batch 2700, loss[loss=0.009868, audio_tagging_loss=0.009868, over 25000.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4946209.24 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:40:02,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1574893.3333333333, ans=0.125 2023-12-24 07:40:53,312 INFO [train.py:886] (3/4) Epoch 50, batch 2750, loss[loss=0.01115, audio_tagging_loss=0.01115, over 25000.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4949825.16 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:41:01,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1575226.6666666667, ans=0.0 2023-12-24 07:41:15,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1575360.0, ans=0.0 2023-12-24 07:41:16,429 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.042e+01 4.295e+01 4.516e+01 5.122e+01, threshold=8.590e+01, percent-clipped=0.0 2023-12-24 07:41:39,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1575493.3333333333, ans=0.2 2023-12-24 07:41:40,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1575493.3333333333, ans=0.1 2023-12-24 07:41:43,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2023-12-24 07:41:45,219 INFO [train.py:886] (3/4) Epoch 50, batch 2800, loss[loss=0.01155, audio_tagging_loss=0.01155, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4950510.10 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:41:56,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-12-24 07:41:57,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1575626.6666666667, ans=0.0 2023-12-24 07:41:59,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1575626.6666666667, ans=0.1 2023-12-24 07:42:00,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-12-24 07:42:00,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2023-12-24 07:42:08,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1575693.3333333333, ans=0.2 2023-12-24 07:42:12,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1575693.3333333333, ans=0.125 2023-12-24 07:42:15,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1575760.0, ans=10.0 2023-12-24 07:42:38,579 INFO [train.py:886] (3/4) Epoch 50, batch 2850, loss[loss=0.0105, audio_tagging_loss=0.0105, over 24750.00 frames. ], tot_loss[loss=0.01053, audio_tagging_loss=0.01053, over 4938407.50 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:42:53,088 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:42:55,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-12-24 07:43:01,386 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.104e+01 4.361e+01 4.546e+01 5.152e+01, threshold=8.721e+01, percent-clipped=0.0 2023-12-24 07:43:06,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1576026.6666666667, ans=0.125 2023-12-24 07:43:21,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1576160.0, ans=0.125 2023-12-24 07:43:28,314 INFO [train.py:886] (3/4) Epoch 50, batch 2900, loss[loss=0.0121, audio_tagging_loss=0.0121, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4934298.33 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:43:37,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1576226.6666666667, ans=0.125 2023-12-24 07:43:40,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-12-24 07:43:41,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2023-12-24 07:43:45,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1576293.3333333333, ans=0.0 2023-12-24 07:43:53,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1576360.0, ans=0.5 2023-12-24 07:44:17,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1576493.3333333333, ans=0.0 2023-12-24 07:44:20,090 INFO [train.py:886] (3/4) Epoch 50, batch 2950, loss[loss=0.01367, audio_tagging_loss=0.01367, over 24919.00 frames. ], tot_loss[loss=0.01045, audio_tagging_loss=0.01045, over 4942807.04 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:44:22,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1576560.0, ans=0.125 2023-12-24 07:44:24,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1576560.0, ans=0.2 2023-12-24 07:44:34,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1576626.6666666667, ans=0.125 2023-12-24 07:44:41,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-12-24 07:44:44,664 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.679e+01 4.048e+01 4.207e+01 4.410e+01 5.096e+01, threshold=8.415e+01, percent-clipped=0.0 2023-12-24 07:44:57,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1576760.0, ans=0.0 2023-12-24 07:45:12,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2023-12-24 07:45:12,383 INFO [train.py:886] (3/4) Epoch 50, batch 3000, loss[loss=0.01243, audio_tagging_loss=0.01243, over 24750.00 frames. ], tot_loss[loss=0.0104, audio_tagging_loss=0.0104, over 4948262.03 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:45:12,384 INFO [train.py:909] (3/4) Computing validation loss 2023-12-24 07:45:33,529 INFO [train.py:917] (3/4) Epoch 50, validation: loss=0.03799, audio_tagging_loss=0.03799, over 3737520.00 frames. 2023-12-24 07:45:33,530 INFO [train.py:918] (3/4) Maximum memory allocated so far is 14873MB 2023-12-24 07:45:44,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1576960.0, ans=0.125 2023-12-24 07:46:01,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-12-24 07:46:12,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1577093.3333333333, ans=0.0 2023-12-24 07:46:16,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1577160.0, ans=0.125 2023-12-24 07:46:25,136 INFO [train.py:886] (3/4) Epoch 50, batch 3050, loss[loss=0.01142, audio_tagging_loss=0.01142, over 25000.00 frames. ], tot_loss[loss=0.01036, audio_tagging_loss=0.01036, over 4952689.96 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:46:30,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1577226.6666666667, ans=0.05 2023-12-24 07:46:39,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1577293.3333333333, ans=0.1 2023-12-24 07:46:41,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-12-24 07:46:41,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-12-24 07:46:49,324 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.806e+01 4.000e+01 4.179e+01 4.391e+01 4.830e+01, threshold=8.357e+01, percent-clipped=0.0 2023-12-24 07:46:53,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1577360.0, ans=0.0 2023-12-24 07:47:01,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-12-24 07:47:13,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-12-24 07:47:15,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1577560.0, ans=0.125 2023-12-24 07:47:16,863 INFO [train.py:886] (3/4) Epoch 50, batch 3100, loss[loss=0.008539, audio_tagging_loss=0.008539, over 24750.00 frames. ], tot_loss[loss=0.01045, audio_tagging_loss=0.01045, over 4958701.41 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:47:35,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1577693.3333333333, ans=0.2 2023-12-24 07:47:43,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1577693.3333333333, ans=0.125 2023-12-24 07:47:47,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1577760.0, ans=0.125 2023-12-24 07:47:55,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1577760.0, ans=0.125 2023-12-24 07:47:56,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-12-24 07:47:59,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1577826.6666666667, ans=0.09899494936611666 2023-12-24 07:48:07,470 INFO [train.py:886] (3/4) Epoch 50, batch 3150, loss[loss=0.0119, audio_tagging_loss=0.0119, over 24750.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4948751.00 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:48:08,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1577893.3333333333, ans=0.1 2023-12-24 07:48:20,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1577960.0, ans=0.125 2023-12-24 07:48:21,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1577960.0, ans=0.0 2023-12-24 07:48:31,862 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.715e+01 4.159e+01 4.326e+01 4.547e+01 5.411e+01, threshold=8.653e+01, percent-clipped=0.0 2023-12-24 07:48:33,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1578026.6666666667, ans=0.2 2023-12-24 07:48:37,843 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:48:39,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1578093.3333333333, ans=0.125 2023-12-24 07:48:40,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1578093.3333333333, ans=0.1 2023-12-24 07:48:49,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1578160.0, ans=0.125 2023-12-24 07:49:00,293 INFO [train.py:886] (3/4) Epoch 50, batch 3200, loss[loss=0.008558, audio_tagging_loss=0.008558, over 25000.00 frames. ], tot_loss[loss=0.01057, audio_tagging_loss=0.01057, over 4943620.65 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:49:03,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-12-24 07:49:07,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1578226.6666666667, ans=0.125 2023-12-24 07:49:29,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1578360.0, ans=22.5 2023-12-24 07:49:37,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1578426.6666666667, ans=0.125 2023-12-24 07:49:52,064 INFO [train.py:886] (3/4) Epoch 50, batch 3250, loss[loss=0.008946, audio_tagging_loss=0.008946, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4942515.77 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:50:13,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1578693.3333333333, ans=0.125 2023-12-24 07:50:14,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1578693.3333333333, ans=0.125 2023-12-24 07:50:15,400 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.606e+01 4.040e+01 4.194e+01 4.403e+01 5.112e+01, threshold=8.389e+01, percent-clipped=0.0 2023-12-24 07:50:32,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1578760.0, ans=15.0 2023-12-24 07:50:35,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=1578826.6666666667, ans=0.2 2023-12-24 07:50:35,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1578826.6666666667, ans=0.125 2023-12-24 07:50:42,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1578826.6666666667, ans=0.125 2023-12-24 07:50:44,520 INFO [train.py:886] (3/4) Epoch 50, batch 3300, loss[loss=0.01261, audio_tagging_loss=0.01261, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4948550.16 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:50:49,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1578893.3333333333, ans=0.125 2023-12-24 07:50:52,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2023-12-24 07:50:58,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1578960.0, ans=0.1 2023-12-24 07:51:19,585 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:51:20,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1579093.3333333333, ans=0.125 2023-12-24 07:51:21,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1579093.3333333333, ans=0.2 2023-12-24 07:51:26,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1579160.0, ans=0.0 2023-12-24 07:51:36,589 INFO [train.py:886] (3/4) Epoch 50, batch 3350, loss[loss=0.01077, audio_tagging_loss=0.01077, over 25000.00 frames. ], tot_loss[loss=0.01036, audio_tagging_loss=0.01036, over 4951112.24 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:51:39,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1579226.6666666667, ans=0.1 2023-12-24 07:51:45,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.91 vs. limit=10.0 2023-12-24 07:51:50,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1579293.3333333333, ans=0.025 2023-12-24 07:51:59,928 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.696e+01 4.063e+01 4.248e+01 4.414e+01 5.248e+01, threshold=8.495e+01, percent-clipped=0.0 2023-12-24 07:52:17,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1579493.3333333333, ans=0.125 2023-12-24 07:52:24,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1579493.3333333333, ans=0.0 2023-12-24 07:52:24,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1579493.3333333333, ans=0.0 2023-12-24 07:52:27,568 INFO [train.py:886] (3/4) Epoch 50, batch 3400, loss[loss=0.01044, audio_tagging_loss=0.01044, over 25000.00 frames. ], tot_loss[loss=0.01039, audio_tagging_loss=0.01039, over 4948274.96 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:53:20,141 INFO [train.py:886] (3/4) Epoch 50, batch 3450, loss[loss=0.01087, audio_tagging_loss=0.01087, over 24750.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4943728.19 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:53:20,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-12-24 07:53:22,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1579893.3333333333, ans=0.0 2023-12-24 07:53:45,038 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.673e+01 4.041e+01 4.285e+01 4.464e+01 5.704e+01, threshold=8.570e+01, percent-clipped=0.0 2023-12-24 07:53:47,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1580026.6666666667, ans=0.0 2023-12-24 07:54:03,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1580160.0, ans=0.0 2023-12-24 07:54:13,384 INFO [train.py:886] (3/4) Epoch 50, batch 3500, loss[loss=0.008464, audio_tagging_loss=0.008464, over 24750.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4941330.20 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:54:28,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1580293.3333333333, ans=0.1 2023-12-24 07:54:35,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1580360.0, ans=0.125 2023-12-24 07:54:44,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1580426.6666666667, ans=0.0 2023-12-24 07:54:48,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1580426.6666666667, ans=0.0 2023-12-24 07:55:04,405 INFO [train.py:886] (3/4) Epoch 50, batch 3550, loss[loss=0.01375, audio_tagging_loss=0.01375, over 24750.00 frames. ], tot_loss[loss=0.01052, audio_tagging_loss=0.01052, over 4943837.20 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:55:12,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1580560.0, ans=0.125 2023-12-24 07:55:28,355 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.681e+01 4.036e+01 4.198e+01 4.391e+01 5.355e+01, threshold=8.396e+01, percent-clipped=0.0 2023-12-24 07:55:35,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1580760.0, ans=0.125 2023-12-24 07:55:56,990 INFO [train.py:886] (3/4) Epoch 50, batch 3600, loss[loss=0.01118, audio_tagging_loss=0.01118, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4946913.41 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:56:07,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1580960.0, ans=0.035 2023-12-24 07:56:37,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2023-12-24 07:56:48,313 INFO [train.py:886] (3/4) Epoch 50, batch 3650, loss[loss=0.009061, audio_tagging_loss=0.009061, over 24750.00 frames. ], tot_loss[loss=0.01036, audio_tagging_loss=0.01036, over 4950441.38 frames. ], batch size: 99, lr: 2.14e-03, grad_scale: 32.0 2023-12-24 07:56:54,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1581226.6666666667, ans=0.125 2023-12-24 07:57:11,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1581360.0, ans=0.125 2023-12-24 07:57:11,802 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.587e+01 3.982e+01 4.158e+01 4.360e+01 5.165e+01, threshold=8.317e+01, percent-clipped=0.0 2023-12-24 07:57:15,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1581360.0, ans=0.125 2023-12-24 07:57:33,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1581493.3333333333, ans=0.125 2023-12-24 07:57:40,463 INFO [train.py:886] (3/4) Epoch 50, batch 3700, loss[loss=0.009853, audio_tagging_loss=0.009853, over 25000.00 frames. ], tot_loss[loss=0.01033, audio_tagging_loss=0.01033, over 4953004.12 frames. ], batch size: 100, lr: 2.14e-03, grad_scale: 16.0 2023-12-24 07:57:51,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-12-24 07:57:59,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1581626.6666666667, ans=0.025 2023-12-24 07:58:10,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1581693.3333333333, ans=0.0 2023-12-24 07:58:13,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1581760.0, ans=0.125 2023-12-24 07:58:29,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1581826.6666666667, ans=0.2 2023-12-24 07:58:33,651 INFO [train.py:886] (3/4) Epoch 50, batch 3750, loss[loss=0.01072, audio_tagging_loss=0.01072, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4948176.62 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:58:40,593 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-12-24 07:58:46,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1581960.0, ans=0.07 2023-12-24 07:58:49,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.14 vs. limit=22.5 2023-12-24 07:58:50,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1581960.0, ans=0.1 2023-12-24 07:58:54,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1582026.6666666667, ans=0.0 2023-12-24 07:58:57,809 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.781e+01 4.161e+01 4.351e+01 4.483e+01 8.860e+01, threshold=8.701e+01, percent-clipped=1.0 2023-12-24 07:59:03,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1582026.6666666667, ans=0.0 2023-12-24 07:59:16,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1582160.0, ans=0.125 2023-12-24 07:59:24,591 INFO [train.py:886] (3/4) Epoch 50, batch 3800, loss[loss=0.007958, audio_tagging_loss=0.007958, over 23935.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4941016.67 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 07:59:46,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1582360.0, ans=0.125 2023-12-24 08:00:09,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1582493.3333333333, ans=0.125 2023-12-24 08:00:15,252 INFO [train.py:886] (3/4) Epoch 50, batch 3850, loss[loss=0.006628, audio_tagging_loss=0.006628, over 25000.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4942972.91 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:00:17,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1582560.0, ans=0.0 2023-12-24 08:00:40,109 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.778e+01 4.086e+01 4.269e+01 4.439e+01 5.542e+01, threshold=8.539e+01, percent-clipped=0.0 2023-12-24 08:00:43,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1582693.3333333333, ans=0.125 2023-12-24 08:01:06,012 INFO [train.py:886] (3/4) Epoch 50, batch 3900, loss[loss=0.009251, audio_tagging_loss=0.009251, over 25000.00 frames. ], tot_loss[loss=0.01036, audio_tagging_loss=0.01036, over 4948771.71 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:01:21,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1582960.0, ans=0.0 2023-12-24 08:01:29,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1583026.6666666667, ans=0.0 2023-12-24 08:01:46,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1583160.0, ans=0.0 2023-12-24 08:01:50,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1583160.0, ans=0.04949747468305833 2023-12-24 08:01:54,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1583160.0, ans=0.125 2023-12-24 08:01:56,763 INFO [train.py:886] (3/4) Epoch 50, batch 3950, loss[loss=0.01126, audio_tagging_loss=0.01126, over 24750.00 frames. ], tot_loss[loss=0.01034, audio_tagging_loss=0.01034, over 4952531.85 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 16.0 2023-12-24 08:02:16,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-12-24 08:02:22,394 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.534e+01 4.014e+01 4.237e+01 4.371e+01 9.981e+01, threshold=8.474e+01, percent-clipped=1.0 2023-12-24 08:02:26,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1583360.0, ans=0.125 2023-12-24 08:02:41,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1583493.3333333333, ans=0.1 2023-12-24 08:02:50,054 INFO [train.py:886] (3/4) Epoch 50, batch 4000, loss[loss=0.01024, audio_tagging_loss=0.01024, over 25000.00 frames. ], tot_loss[loss=0.01039, audio_tagging_loss=0.01039, over 4957444.34 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:02:52,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1583560.0, ans=0.125 2023-12-24 08:03:02,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1583626.6666666667, ans=0.125 2023-12-24 08:03:14,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1583693.3333333333, ans=0.07 2023-12-24 08:03:40,184 INFO [train.py:886] (3/4) Epoch 50, batch 4050, loss[loss=0.008543, audio_tagging_loss=0.008543, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4955948.93 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:03:43,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1583893.3333333333, ans=0.125 2023-12-24 08:03:50,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1583893.3333333333, ans=0.125 2023-12-24 08:03:55,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1583960.0, ans=0.125 2023-12-24 08:04:05,078 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.755e+01 4.142e+01 4.284e+01 4.513e+01 5.002e+01, threshold=8.568e+01, percent-clipped=0.0 2023-12-24 08:04:06,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.30 vs. limit=22.5 2023-12-24 08:04:16,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1584093.3333333333, ans=0.125 2023-12-24 08:04:30,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1584160.0, ans=0.125 2023-12-24 08:04:31,987 INFO [train.py:886] (3/4) Epoch 50, batch 4100, loss[loss=0.007971, audio_tagging_loss=0.007971, over 24001.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4948627.72 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:04:39,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1584226.6666666667, ans=0.0 2023-12-24 08:04:58,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1584360.0, ans=0.0 2023-12-24 08:05:16,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1584493.3333333333, ans=0.0 2023-12-24 08:05:18,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1584493.3333333333, ans=0.125 2023-12-24 08:05:24,703 INFO [train.py:886] (3/4) Epoch 50, batch 4150, loss[loss=0.01139, audio_tagging_loss=0.01139, over 24750.00 frames. ], tot_loss[loss=0.01048, audio_tagging_loss=0.01048, over 4946072.51 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:05:29,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1584560.0, ans=0.2 2023-12-24 08:05:31,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1584560.0, ans=0.125 2023-12-24 08:05:48,084 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.703e+01 4.057e+01 4.232e+01 4.457e+01 4.913e+01, threshold=8.465e+01, percent-clipped=0.0 2023-12-24 08:06:09,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1584826.6666666667, ans=0.1 2023-12-24 08:06:11,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1584826.6666666667, ans=0.125 2023-12-24 08:06:15,656 INFO [train.py:886] (3/4) Epoch 50, batch 4200, loss[loss=0.01163, audio_tagging_loss=0.01163, over 24750.00 frames. ], tot_loss[loss=0.01049, audio_tagging_loss=0.01049, over 4947933.16 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:06:35,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1584960.0, ans=0.1 2023-12-24 08:06:50,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1585093.3333333333, ans=0.125 2023-12-24 08:06:59,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1585160.0, ans=0.025 2023-12-24 08:06:59,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1585160.0, ans=0.125 2023-12-24 08:07:08,512 INFO [train.py:886] (3/4) Epoch 50, batch 4250, loss[loss=0.009278, audio_tagging_loss=0.009278, over 25000.00 frames. ], tot_loss[loss=0.01042, audio_tagging_loss=0.01042, over 4946149.42 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:07:10,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1585226.6666666667, ans=0.125 2023-12-24 08:07:12,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1585226.6666666667, ans=0.125 2023-12-24 08:07:26,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1585360.0, ans=0.2 2023-12-24 08:07:26,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1585360.0, ans=0.125 2023-12-24 08:07:32,775 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.753e+01 4.073e+01 4.229e+01 4.382e+01 5.254e+01, threshold=8.458e+01, percent-clipped=0.0 2023-12-24 08:07:46,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2023-12-24 08:07:48,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1585493.3333333333, ans=0.125 2023-12-24 08:07:58,904 INFO [train.py:886] (3/4) Epoch 50, batch 4300, loss[loss=0.01035, audio_tagging_loss=0.01035, over 25000.00 frames. ], tot_loss[loss=0.01041, audio_tagging_loss=0.01041, over 4949049.22 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:08:02,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-12-24 08:08:16,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1585626.6666666667, ans=0.05 2023-12-24 08:08:17,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1585626.6666666667, ans=0.125 2023-12-24 08:08:29,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.08 vs. limit=22.5 2023-12-24 08:08:31,304 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:08:50,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-12-24 08:08:52,019 INFO [train.py:886] (3/4) Epoch 50, batch 4350, loss[loss=0.01082, audio_tagging_loss=0.01082, over 24750.00 frames. ], tot_loss[loss=0.01044, audio_tagging_loss=0.01044, over 4950476.25 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:08:56,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1585893.3333333333, ans=0.125 2023-12-24 08:09:01,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1585960.0, ans=0.07 2023-12-24 08:09:16,918 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.615e+01 4.119e+01 4.307e+01 4.459e+01 5.187e+01, threshold=8.614e+01, percent-clipped=0.0 2023-12-24 08:09:28,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1586093.3333333333, ans=0.125 2023-12-24 08:09:44,416 INFO [train.py:886] (3/4) Epoch 50, batch 4400, loss[loss=0.01254, audio_tagging_loss=0.01254, over 24750.00 frames. ], tot_loss[loss=0.0105, audio_tagging_loss=0.0105, over 4946164.64 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:09:56,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-12-24 08:10:01,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1586293.3333333333, ans=0.0 2023-12-24 08:10:11,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1586360.0, ans=0.125 2023-12-24 08:10:35,809 INFO [train.py:886] (3/4) Epoch 50, batch 4450, loss[loss=0.01004, audio_tagging_loss=0.01004, over 25000.00 frames. ], tot_loss[loss=0.01051, audio_tagging_loss=0.01051, over 4941065.65 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:10:39,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1586560.0, ans=0.125 2023-12-24 08:10:45,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1586560.0, ans=0.125 2023-12-24 08:10:50,480 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:11:01,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1586693.3333333333, ans=0.125 2023-12-24 08:11:01,844 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.798e+01 4.098e+01 4.310e+01 4.508e+01 5.882e+01, threshold=8.619e+01, percent-clipped=0.0 2023-12-24 08:11:05,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1586693.3333333333, ans=0.1 2023-12-24 08:11:20,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2023-12-24 08:11:24,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1586826.6666666667, ans=0.1 2023-12-24 08:11:28,296 INFO [train.py:886] (3/4) Epoch 50, batch 4500, loss[loss=0.01019, audio_tagging_loss=0.01019, over 24750.00 frames. ], tot_loss[loss=0.01047, audio_tagging_loss=0.01047, over 4943092.68 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:11:37,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=22.5 2023-12-24 08:11:41,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-12-24 08:12:20,252 INFO [train.py:886] (3/4) Epoch 50, batch 4550, loss[loss=0.01166, audio_tagging_loss=0.01166, over 25000.00 frames. ], tot_loss[loss=0.01035, audio_tagging_loss=0.01035, over 4946392.18 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:12:40,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1587360.0, ans=0.125 2023-12-24 08:12:40,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1587360.0, ans=0.0 2023-12-24 08:12:44,697 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.783e+01 4.050e+01 4.235e+01 4.395e+01 5.112e+01, threshold=8.470e+01, percent-clipped=0.0 2023-12-24 08:12:51,189 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-12-24 08:13:12,187 INFO [train.py:886] (3/4) Epoch 50, batch 4600, loss[loss=0.009861, audio_tagging_loss=0.009861, over 25000.00 frames. ], tot_loss[loss=0.01033, audio_tagging_loss=0.01033, over 4955202.12 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:13:12,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1587560.0, ans=0.1 2023-12-24 08:13:18,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1587560.0, ans=0.125 2023-12-24 08:13:21,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1587626.6666666667, ans=0.125 2023-12-24 08:13:28,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1587626.6666666667, ans=0.125 2023-12-24 08:13:40,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1587693.3333333333, ans=0.0 2023-12-24 08:13:48,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1587760.0, ans=0.125 2023-12-24 08:13:58,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1587826.6666666667, ans=0.125 2023-12-24 08:14:03,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1587893.3333333333, ans=0.0 2023-12-24 08:14:04,477 INFO [train.py:886] (3/4) Epoch 50, batch 4650, loss[loss=0.00873, audio_tagging_loss=0.00873, over 25000.00 frames. ], tot_loss[loss=0.01035, audio_tagging_loss=0.01035, over 4957490.84 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:14:08,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587893.3333333333, ans=0.1 2023-12-24 08:14:13,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1587960.0, ans=0.0 2023-12-24 08:14:28,732 WARNING [optim.py:484] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.571e+01 4.073e+01 4.249e+01 4.503e+01 5.611e+01, threshold=8.499e+01, percent-clipped=0.0 2023-12-24 08:14:40,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1588093.3333333333, ans=0.0 2023-12-24 08:14:54,258 INFO [train.py:886] (3/4) Epoch 50, batch 4700, loss[loss=0.01083, audio_tagging_loss=0.01083, over 24750.00 frames. ], tot_loss[loss=0.01054, audio_tagging_loss=0.01054, over 4950473.87 frames. ], batch size: 99, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:14:58,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.71 vs. limit=10.0 2023-12-24 08:15:02,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1588226.6666666667, ans=0.0 2023-12-24 08:15:06,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1588293.3333333333, ans=0.0 2023-12-24 08:15:06,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1588293.3333333333, ans=0.0 2023-12-24 08:15:35,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1588493.3333333333, ans=0.125 2023-12-24 08:15:42,034 INFO [train.py:886] (3/4) Epoch 50, batch 4750, loss[loss=0.01179, audio_tagging_loss=0.01179, over 25000.00 frames. ], tot_loss[loss=0.01062, audio_tagging_loss=0.01062, over 4944548.77 frames. ], batch size: 100, lr: 2.13e-03, grad_scale: 32.0 2023-12-24 08:15:52,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-12-24 08:15:57,222 INFO [train.py:1099] (3/4) Done!